The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a new approach to the problem of semantic segmentation of digital images. We aim to improve the performance of some state-of-the-art approaches for the task. We exploit a new version of texton feature [28], which can encode image texture and object layout for learning a robust classifier. We propose to use a genetic algorithm for the learning parameters of weak classifiers in a...
Growing scale of server infrastructure in large datacenters has led to an increased need for effective server workload prediction mechanisms. Two main challenges faced in server workload prediction task are lack of large-scale training data and changes in the underlying distribution of server workloads in events like change in dominant applications of servers or change in allocation of servers, etc...
Real world data mining applications such as Mine Countermeasure Missions (MCM) involve learning from imbalanced data sets, which contain very few instances of the minority classes and many instances of the majority class. For instance, the number of naturally occurring clutter objects (such as rocks) that are detected typically far outweighs the relatively rare event of detecting a mine. In this paper...
Disturbing Neighbors (DN) is a method for generating classifier ensembles. Moreover, it can be combined with any other ensemble method, generally improving the results. This paper considers the application of these ensembles to imbalanced data: classification problems where the class proportions are significantly different. DN ensembles are compared and combined with Bagging, using three tree methods...
Synthetic Minority Oversampling TEchnique (SMOTE) is a popular oversampling method that was proposed to improve random oversampling but its behavior on high-dimensional data has not been thoroughly investigated. In this paper we evaluate the performance of SMOTE on high-dimensional data, using gene expression microarray data. We observe that SMOTE does not attenuate the bias towards the classification...
Learning with imbalanced datasets has been a major topic of study for many years. In this paper, we focus on a type of imbalance called imbalance due to rare instances. Such imbalances occur in a variety of domains. Rare instances have received less focus in prediction problems and we wish to draw attention to how accuracy can be improved in the presence of rare data. We discuss an approach to regression...
In this paper we present a Dynamic Sampling Framework for use with multi-class imbalanced data containing any number of classes. The framework makes use of existing sampling techniques such as RUS, ROS, and SMOTE and ties the classification algorithm into the sampling process in a wrapper like manner. In doing so the framework is able to search for a desirably sampled training set, thus eliminating...
The goal of class prediction studies is to develop rules to accurately predict the class membership of new subjects. The classifiers differ in the way they combine the values of the variables available for each subject. Frequently the classifiers are developed using class-imbalanced data, where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced...
Computer-Aided Diagnosis (CAD) systems are widely used for detection of various kinds of abnormalities in mammography images. Masses are one type of these abnormalities which are mostly characterized by their margin and shape. For classification of masses proper features are needed to be extracted. However, the number of well-known features for describing margin is much fewer than geometrical, shape,...
Multi-class classification problem has become a challenging problem in bioinformatics research. The problem becomes more difficult as the number of classes increases. Decomposing the problem into a set of binary problems can be a good solution in some cases. One of the popular approaches is to build a hierarchical tree structure where a binary classifier is used at each node of the tree. This paper...
Building prediction models for suggestive knowledge from multiple sources dynamically is of great interest from a clinical decision support point of view. This is valuable in situations where the local clinical data repository does not have sufficient number of records to draw conclusions from. However, due to privacy concerns, hospitals are reluctant to divulge patient records. Consequently, a distributed...
This paper presents a new interactive scatter plot visualization for multi-dimensional data analysis. We apply RST to reduce the visual complexity through dimensionality reduction. We use an innovative point-to-region mouse click concept to enable direct interactions with scatter points that are theoretically impossible. To show the decision trend we use a virtual Z dimension to display a set of linear...
Numerical simulation has become an inevitable tool in most industrial product development processes with simulations being used to understand the influence of design decisions (parameter configurations) on the structure and properties of the product. However, in order to allow the engineer to thoroughly explore the design space and fine-tune parameters, many -- usually very time-consuming -- simulation...
Distributed knowledge has attracted more and more attention as a way to improve knowledge sharing across the world using the Internet. This paradigm enables many systems to interact with each other and share their knowledge while keeping their own ontology. Several researchers have worked on this topic with different strategies but they all argue that the main issue is to make sure that the other...
Even though facial expressions have universal meaning in communications, their appearances show a large amount of variation due to many factors, such as different image acquisition setups, different ages, genders, and cultural backgrounds etc. Collecting enough amounts of annotated samples for each target domain is impractical, this paper investigates the problem of facial expression recognition in...
Despite early success in automatic chord recognition, recent efforts are yielding diminishing returns while basically iterating over the same fundamental approach. Here, we abandon typical conventions and adopt a different perspective of the problem, where several seconds of pitch spectra are classified directly by a convolutional neural network. Using labeled data to train the system in a supervised...
Machine Learning has been used to automatically generate a probabilistic food-web from Farm Scale Evaluation (FSE) data. The initial food web proposed by machine learning has been examined by domain experts and comparison with the literature shows that many of the links are corroborated. The FSE data were collected using two different sampling techniques, namely Vortis and pitfall. The corroboration...
We compare classifiers for the classification of myoelectric signals and show that the performance can be improved by using spatial features that are extracted by independent component analysis. The obtained filters can be interpreted as reflecting the spatial structure of the data source. We find that the performance improves for several preprocessing algorithms, but it affects the relative performance...
More than a decade of research has produced numerous representations and similarity measures to support time series classification and clustering. Yet most of the work in the field is so focused on the representation or similarity measure that it ignores the possibility of improving performance using ensembles of representations or classifiers. This paper explores ways of exploiting representational...
This paper investigates application of novel Bidirectional Data Partitioning Technique (BDP) to cancer survival analysis. Author has developed this technique for classification problems with unstable feature relevance and SEER Cancer Data illustrates this machine learning concept. BDP is applied for survival analysis in order to find groups of patients with different key factors that determine survival...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.