The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The Random Forests algorithm belongs to the class of ensemble learning methods, which are common used in classification problem. In this paper, we studied the problem of adopting the Random Forests algorithm to learn raw data from real usage scenario. An improvement, which is stable, strict, high efficient, data-driven, problem independent and has no impact on algorithm performance, is proposed to...
Sequential pattern mining is one of the most studied and challenging tasks in data mining. However, the extension of well-known methods from many other classical patterns to sequences is not a trivial task. In this paper we study the notion of δ-freeness for sequences. While this notion has extensively been discussed for item sets, this work is the first to extend it to sequences. We...
Many real-world networks are featured with dynamic changes, such as new nodes and edges, and modification of the node content. Because changes are continuously introduced to the network in a streaming fashion, we refer to such dynamic networks as streaming networks. In this paper, we propose a new classification method for streaming networks, namely streaming network node classification (SNOC). For...
Databases in clinical scenario have tremendous amount of data regarding patients and clinical history associated. Here, data mining plays vital role in searching for patterns within huge clinical data that could provide useful basis of knowledge for efficient and effective decision-making. Classification mechanism is widely used tool of data mining employed in healthcare applications to facilitate...
In response to the urgent need for learning tools tuned to big data analytics, the present paper introduces a feature selection approach to efficient clustering of high-dimensional vectors. The resultant method leverages random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to yield novel dimensionality reduction schemes. The advocated...
For classification of High Dimensional data, feature selection is the most important step for obtaining optimal result with respect to processing power required and time taken. Feature selection is a method by which the most relevant feature is selected from a set of features containing redundant and irrelevant features thereby reducing the load on the classification algorithm. This paper proposes...
Since four decades, a sincere concern has aroused among managerial, professional, towards the satisfaction of teaching-learning objective in Academia. Huge span of time has already been spent revealing student's profile patterns using predictive modeling methods, however, very little effort is put up in identifying the causative features responsible for varied students' performances followed by decisive...
The diagnosis of incipient fault is important for power transformer condition monitoring. The incipient faults are monitored by conventional and artificial intelligence based models. The key gases, percentage value of gases and ratio of Doernenburg, Roger, IEC methods are input variables to artificial intelligence (AI) models which affects the accuracy of incipient fault diagnosis so selection of...
Condition monitoring is a vital task in the maintenance of industry machines. This paper proposes a reliable condition monitoring method using a genetic algorithm (GA) which selects the most discriminate features by taking a transformation matrix. Experimental results show that the features selected by the GA outperforms original and randomly selected features using the same k-nearest neighbor (k-NN)...
Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral...
This paper proposes a framework for preparing and using corpora from online social networks and review sites for sentiment analysis task. The framework consists of three phases. The first phase is the preprocessing and cleaning of data collected, then data annotation. The second phase is applying various text processing techniques including: removing stopwords, replacing the negation words and the...
Customer Relationship Management possess Business Intelligence by incorporating information acquisition, information storage, and decision support functions to provide customized customer service. It enables customer representatives to analyze and classify data to address customer needs in order to promote greater customer satisfaction and retention, but in reality we have learned CRM classification...
Labelling maximization (F-max) is an unbiased metric for estimation of the quality of non-supervised classification (clustering) that promotes the clusters with a maximum value of feature F-measure. In this paper, we show that an adaptation of this metric within the supervised classification allows to perform a selection of features and to calculate for each of them a function of contrast. The method...
In today's networked environment, massive volume of data being generated, gathered and stored in databases across the world. This trend is growing very fast, year after year. Today it is normal to find databases with terabytes of data, in which vital information and knowledge is hidden. The unseen information in such databases is not feasible to mine without efficient mining techniques for extracting...
A micro array represents thousands of gene expression levels across a few samples. Determination of an optimal set of features from such a high dimensional dataset requires a good feature selection method. Based on statistical significance of the features, an elimination of insignificant genes can be performed. However such methods lack biological validation. In this paper we propose a method where...
Schizophrenia is a serious psychiatric illness which needs early and accurate diagnosis. Difference in activation patterns of schizophrenia patients and healthy subjects can be identified with the help of functional magnetic resonance imaging (fMRI). However, manual diagnosis using fMRI depends on subjective observation and may be erroneous. This has motivated the pattern recognition and machine learning...
Local appearance descriptors are widely used on facial emotion recognition tasks. With these descriptors, image filters, such as Gabor wavelet or local binary patterns (LBP) are applied on the whole or specific regions of the face to extract facial appearance changes. But it is also clear that beside feature descriptor; choice of suitable learning method that integrates feature novelty is vital. The...
In our previous study, a grouping-geneticalgorithm- based (GGA-based) attribute clustering process has been proposed for grouping features. In this paper, we further improve its performance and propose a center-based GGA for attribute clustering (CGGA). A new encoding scheme with corresponding crossover and mutation operators are designed, and an improved fitness function is proposed to achieve better...
The aim of this study is to apply automatic speech recognition (ASR) mechanism to improve the amount of information extracted from the voice and to increase the accuracy of the system by using selective highly discriminative features among different types of acoustic features. For feature extraction, we applied three techniques which are Mel Frequency Cepstral Coefficient (MFCC), Linear Prediction...
In multiple classifier systems, base classifiers are trained to be accurate and diverse by a set of training data. The generation of training data is necessary and important in classifier ensemble, which can be achieved by instance selection (IS) or feature selection (FS) on initial data. In this paper, a feature-prior FS-IS hybrid ensemble method is proposed by integrating feature selection with...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.