The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Using airborne full-waveform LiDAR metrics derived by 3-D tree segmentation, this study estimated single tree's diameter at breast height (DBH) and stem volume (STV). Four regression models were used, including multilinear regression and three up-to-date regression models (i.e., least square boosting trees regression, random forest, and $\varepsilon$-support vector regression) from the machine learning...
This paper tackles the Romanian syllabification and stress assignment problems, and proposes an efficient machine learning based solution. We show that by designing the appropriate feature sets for each specific problem, learning algorithms achieve satisfactory accuracy rates for both problems (∼92% for syllabification, ∼85% for stress assignment), even for relatively small training set sizes. We...
After lung cancer, breast cancer is known to be the greatest cause for death among females [20]. The improving effectiveness of machine learning approaches is being given a lot of importance by medical practitioners for breast cancer diagnosis. The paper proposes an effective hybridized classifier for breast cancer diagnosis. The classifier is made by combining an unsupervised artificial neural network...
Network traffic classification gains continuous interesting while many applications emerge on the different kinds of networks with obfuscation techniques. Decision tree is a supervised machine learning method used widely to identify and classify network traffic. In this paper, we introduce a comparative study focusing on two common decision tree methods namely: C4.5 and Random forest. The study offers...
The advent of synoptic sky surveys has spurred the development of techniques for real-time classification of astronomical sources in order to ensure timely follow-up with appropriate instruments. Previous work has focused on algorithm selection or improved light curve representations, and naively convert light curves into structured feature sets without regard for the time span or phase of the light...
The recognition of human activity is a challenging topic for machine learning. We present an analysis of Support Vector Machines (SVM) and Random Forests (RF) in their ability to accurately classify Kinect kinematic activities. Twenty participants were captured using the Microsoft Kinect performing ten physical rehabilitation activities. We extracted the kinematic location, velocity and energy of...
The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for...
With the emergence of Web 2.0, Sentiment Analysis is receiving more and more attention. Several interesting works were performed to address different issues in Sentiment Analysis. Nevertheless, the problem of Unbalanced Data Sets was not enough tackled within this research area. This paper presents the study we have carried out to address the problem of unbalanced data sets in supervised sentiment...
Feature reduction is a major problem in data mining. Though traditional methods such as feature ranking and subset selection have been widely used, there has been little consideration given to assuring satisfactory performance of a learning machine in relation to the minimum of features required or the “critical dimension”. This critical dimension is unique to a specific dataset, learning machine,...
Novel variable interaction measures with random forest classifiers are proposed. The proposed methods efficiently measure the change in classification performance due to non-linear interactions between variables by exploiting random permutation of out-of-bag samples in random forests. They can be readily extended to measure n-subset interactions in multi-class bagging ensembles with any base supervised...
Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified device...
Qualitative and quantitative description of functional connectivity graphs using graph attributes is of great interest to neuroscience, and has led to remarkable insights in the field. However, the statistical techniques used have generally been limited to whole-group, post-hoc studies. In this paper, we propose instead a novel approach to perform predictive inference on single subjects. It is based...
In machine learning, non-linear dimensionality reduction (NLDR) is commonly used to embed high-dimensional data into a low-dimensional space while preserving local object adjacencies. However, the majority of NLDR methods define object adjacencies using distance metrics that do not account for the quality of the features in the high-dimensional space. In this paper we present Boosted Spectral Embedding...
Obfuscated and encrypted protocols hinder traffic classification by classical techniques such as port analysis or deep packet inspection. Therefore, there is growing interest for classification algorithms based on statistical analysis of the length of the first packets of flows. Most classifiers proposed in literature are based on machine learning techniques and consider each flow independently of...
The Human Protein Atlas is a rich source of location proteomics data. In this work, we present an automated approach for processing and classifying major subcellular patterns in the Atlas images. We demonstrate that two different classification frameworks (support vector machine and random forest) are effective at determining subcellular locations; we can analyze over 3500 Atlas images with a high...
There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge...
In real world applications, there are great many of DNA expressed microarray data, many supervised classification algorithms such as decision tree, KNN and SVM in the machine learning field have been introduced for microarray data classification. However, in real worlds, the labeled examples, especially gene expression data examples are often very difficult and expensive to obtain. The traditional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.