The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper introduces a sequentially motivated approach to processing streams of images from datasets with low memory demands. We utilize fuzzy clustering as an incremental dictionary learning scheme and explain how the corresponding membership functions can be subsequently used in encoding features for image patches. We focus on replicating the codebook learning and classification stages from an...
The goal of our data-mining multi-agent system is to facilitate data-mining experiments without the necessary knowledge of the most suitable machine learning method and its parameters to the data. In order to replace the expertâs knowledge, the meta-learning subsystems are proposed including the parameter-space search and method recommendation based on previous experiments. In this paper...
Growing scale of server infrastructure in large datacenters has led to an increased need for effective server workload prediction mechanisms. Two main challenges faced in server workload prediction task are lack of large-scale training data and changes in the underlying distribution of server workloads in events like change in dominant applications of servers or change in allocation of servers, etc...
In this paper we propose a Sequential Ensemble Classification (SEC) technique which is designed to tackle the problem of learning from a data set with an extremely unbalanced distribution of instances among the classes. This system employs a specific decomposition technique that reduces the degree of unbalance in the data by transforming multi-class problem into a sequence of binary class problems...
Real world data mining applications such as Mine Countermeasure Missions (MCM) involve learning from imbalanced data sets, which contain very few instances of the minority classes and many instances of the majority class. For instance, the number of naturally occurring clutter objects (such as rocks) that are detected typically far outweighs the relatively rare event of detecting a mine. In this paper...
Synthetic Minority Oversampling TEchnique (SMOTE) is a popular oversampling method that was proposed to improve random oversampling but its behavior on high-dimensional data has not been thoroughly investigated. In this paper we evaluate the performance of SMOTE on high-dimensional data, using gene expression microarray data. We observe that SMOTE does not attenuate the bias towards the classification...
In the last decade, class imbalance has attracted a huge amount of attention from researchers and practitioners. Class imbalance is ubiquitous in Machine Learning, Data Mining and Pattern Recognition applications; therefore, these research communities have responded to such interest with literally dozens of methods and techniques. Surprisingly, there are still many fundamental open-ended questions...
Learning with imbalanced datasets has been a major topic of study for many years. In this paper, we focus on a type of imbalance called imbalance due to rare instances. Such imbalances occur in a variety of domains. Rare instances have received less focus in prediction problems and we wish to draw attention to how accuracy can be improved in the presence of rare data. We discuss an approach to regression...
In this paper we present a Dynamic Sampling Framework for use with multi-class imbalanced data containing any number of classes. The framework makes use of existing sampling techniques such as RUS, ROS, and SMOTE and ties the classification algorithm into the sampling process in a wrapper like manner. In doing so the framework is able to search for a desirably sampled training set, thus eliminating...
In this paper, we propose a multi-class boosting method (multiBoost.imb) to address difficulties of learning from imbalanced data set as well as employment of stable base learners. A random resampling strategy is incorporated to diversify the training data set and to recover balance among all classes. Extending AdaBoost by adding an error adjustment parameter, early termination in the training phase...
The goal of class prediction studies is to develop rules to accurately predict the class membership of new subjects. The classifiers differ in the way they combine the values of the variables available for each subject. Frequently the classifiers are developed using class-imbalanced data, where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced...
Multi-class classification problem has become a challenging problem in bioinformatics research. The problem becomes more difficult as the number of classes increases. Decomposing the problem into a set of binary problems can be a good solution in some cases. One of the popular approaches is to build a hierarchical tree structure where a binary classifier is used at each node of the tree. This paper...
In the clinical diagnosis of facial dysmorphology, geneticists attempt to identify the underlying syndromes by associating facial features before cyto or molecular techniques are explored. Specifying genotype-phenotype correlations correctly among many syndromes is labor intensive especially for very rare diseases. The use of a computer based prediagnosis system can offer effective decision support...
This paper presents an intelligent steganalysis method to investigate anomalies in Waveform Audio File Format (Wave). There are many images and audio file formats available to hide sensitive information without attracting attentions. Steganalysis is a set of techniques to reveal secrets in audio, video or other file formats. Image based stego analysis is fairly simple because hiding methods such as...
This study proposes a visual approach for classification of multivariate data based on the enhanced separation feature of a visual technique, called Hypothesis-Oriented Verification and Validation by Visualization (HOV3). In this approach, the user first builds up a visual classifier from a training dataset based on its data projection plotted by HOV3 with a statistical measurement of the training...
Even though facial expressions have universal meaning in communications, their appearances show a large amount of variation due to many factors, such as different image acquisition setups, different ages, genders, and cultural backgrounds etc. Collecting enough amounts of annotated samples for each target domain is impractical, this paper investigates the problem of facial expression recognition in...
Despite early success in automatic chord recognition, recent efforts are yielding diminishing returns while basically iterating over the same fundamental approach. Here, we abandon typical conventions and adopt a different perspective of the problem, where several seconds of pitch spectra are classified directly by a convolutional neural network. Using labeled data to train the system in a supervised...
Hidden Markov models (HMM) have been widely studied and applied over decades. The standard supervised learning method for HMM is maximum likelihood estimation (MLE) which maximizes the joint probability of training data. However, the most natural way of training would be finding the parameters that directly minimize the error rate of a given training set. In this article, we propose a novel learning...
We address the problem of large-scale topic classification of web pages based on the minimal text available in the URLs. This problem is challenging because of the sparsity of feature vectors that are derived from the URL text, and the typical asymmetry between the cardinality of train and test sets due to non-availability of sufficient sets of annotated URLs for training and very large test sets...
This study presents a novel adaptive control based on a neural network for dc -- dc converters. The control method is required to adapt to changes of conditions to obtain high performance dc -- dc converters. In this study, the neural network control is adopted to improve the transient response of dc -- dc converters. It woks in coordination with a conventional PID control to realize a high adaptive...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.