The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In real world classification tasks, the original instances are represented by raw features. Usually domain related algorithms are needed to extract discriminative features. But the algorithms selection and additional parameters tuning are difficult for people with little domain knowledge and experience. In this paper, a new machine learning framework called "decompose learning" is proposed...
Recently, the following discrimination aware classification problem was introduced: given a labeled dataset and an attribute B, find a classifier with high predictive accuracy that at the same time does not discriminate on the basis of the given attribute B. This problem is motivated by the fact that often available historic data is biased due to discrimination, e.g., when B denotes ethnicity. Using...
Time-series classification is an active research topic in machine learning, as it finds applications in numerous domains. The k-NN classifier, based on the discrete time warping (DTW) distance, had been shown to be competitive to many state-of-the art time-series classification methods. Nevertheless, due to the complexity of time-series data sets, our investigation demonstrates that a single, global...
Random forest is an excellent ensemble learning method, which is composed of multiple decision trees grown on random input samples and splitting nodes on a random subset of features. Due to its good classification and generalization ability, random forest has achieved success in various domains. However, random forest will generate many noisy trees when it learns from the data set that has high dimension...
Most classification studies are done by using all the objects data. It is expected to classify objects by using some subsets data in the total data. A rough set based reduct is a minimal subset of features, which has almost the same discernible power as the entire conditional features. Here, we propose a greedy algorithm to compute a set of rough set reducts which is followed by the k-nearest neighbor...
Supervised learning uses a training set of labeled examples to compute a classifier which is a mapping from feature vectors to class labels. The success of a learning algorithm is evaluated by its ability to generalize, i.e., to extend this mapping accurately to new data that is commonly referred to as the test data. Good generalization depends crucially on the quality of the training set. Because...
A semi-supervised approach for classification of network flows is analyzed and implemented. This traffic classification methodology uses only flow statistics to classify traffic. Specifically, a semi-supervised method that allows classifiers to be designed from training data consisting of only a few labeled and many unlabeled flows. The approach consists of two steps, clustering and classification...
In this paper we study the problem of classifier learning where the input data contains unjustified dependencies between some data attributes and the class label. Such cases arise for example when the training data is collected from different sources with different labeling criteria or when the data is generated by a biased decision process. When a classifier is trained directly on such data, these...
In this paper, a new framework to build an adaptive classifier is introduced. At first, a clustering algorithm, density-based spatial clustering of applications with noise (DBSCAN) is applied to a set of sample data to form initial set of clusters. The clusters are represented as classes. Using support vector machine (SVM), a classifier model is generated. In real world application, data comes in...
A learner induced from an imbalanced dataset has a low error rate for the majority class and an undesirable error rate for the minority class. This paper provides a study on the various methodologies that have tried to handle this problem. Finally, it presents an experimental study of these methodologies with a proposed grading cost-sensitive ensemble and it concludes that this ensemble is a more...
The k-nearest neighbor(k-NN) is improved by applying rough set and distance functions with relearning and ensemble computations to classify data with the higher accuracy values. Then, the proposed relearning and combining ensemble computations are an effective technique for improving accuracy. We develop a new approach to combine kNN classifier based on rough set and distance functions with relearning...
This work systematically examines a Clustering Inside Classes (CIC) approach to classification. In CIC, each class is partitioned into subclasses based on cluster analysis. We find that CIC, by extracting local structure and producing compact subclasses, can improve performance of linear classifiers such as SVM and logistic regression. It is compared against a global classifier on four benchmark datasets...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.