The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data mining is the process of extraction of relevant information from a collection of data. Mining of a particular information related to a concept is done on the basis of the feature of the data. The accessing of these features hence for data retrieval can be termed as the feature extraction mechanism. Different type of feature extraction methods are being used. The feature selection algorithm should...
Attribute reduction of an information system is a key problem in rough set theory and its applications. This paper proposes a new feature selection mechanism based on backward elimination algorithm to solve the attribute reduction problem in roughest theory. It is the most promising technique in the Rough set theory, a new mathematical approach to reduct car and cancer dataset using backward elimination...
Classifier selection aims to reduce the size of an ensemble of classifiers in order to improve its efficiency and classification accuracy. Recently an information-theoretic view was presented for feature selection. It derives a space of possible selection criteria and show that several feature selection criteria in the literature are points within this continuous space. The contribution of this paper...
Attribute dependency function is very important for feature selection in data mining, pattern recognition and machine learning. However, Pawlak's is inadequate for some information systems, and Daisuke's definition is only for categorical attribute. In this paper, we introduce a new definition based on partition for numerical attribute. The advantage of the definition is that heterogeneous features...
Feature selection is commonly used in bioinformatics applications, such as gene selection from DNA micro array data. Recently, wrapper methods have been proposed as an improvement over traditionally used filter based feature selection methods. In wrapper methods, the goodness of a feature set is often measured using the cross-validation performance of a machine learning method trained with the features...
Development of a feature ranking method based upon the discriminative power of features and unbiased towards classifiers is of interest. We have studied a consensus feature ranking method, based on multiple classifiers, and have shown its superiority to well known statistical ranking methods. In a target environment such as a medical dataset, missing values and an unbalanced distribution of data must...
The problem of spam detection is a crucial task in the web information retrieval systems. The dynamic nature of information resources as well as the continuous changes in the information demands of the users makes the task of web spam detection a challenging topic. So far many different methods from researchers with different backgrounds have been proposed to tackle with spam web pages problem. In...
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of ``normal'' training data points in a chosen...
This paper proposes a new feature-selection strategy by integrating the Rough Set Theory (RST) and Particle Swarm Optimisation (PSO) algorithms to generate a set of discriminatory features for the classification problem. The proposed method is seen as a marriage between filter and wrapper approaches in which the RST is used to pre-reduce the feature set before optimisation by PSO, a meta-heuristic...
The data mining and machine learning community is often faced with two key problems: working with imbalanced data and selecting the best features for machine learning. This paper presents a process involving a feature selection technique for selecting the important attributes and a data sampling technique for addressing class imbalance. The application domain of this study is software engineering,...
Text classification is an important research field of data mining topics. This article brings a mutual information and information entropy pair based feature selection method (MIIEP_FS) based on the theory of information entropy and information entropy pair concept. This method measure the classification effect using feature by mutual information method and show the difference extent between the features...
Feature selection continues to grow in importance in many areas of science and engineering, as large datasets become increasingly common. In particular, bioscience and medical datasets routinely contain several thousands of features. For effective data mining in such datasets, tools are required that can reliably distinguish the most relevant features. The latter is a useful goal in itself (e.g. such...
Most of the previous researches on sentiment analysis concentrate on the binary distinction of positive vs. negative. This paper presents the multi-class sentiment classification problem that attempt to mine the implied rating information from reviews. We use four machine learning methods and two feature selection methods to find out whether or not the multi-class sentiment classification problem...
Feature selections have seen growing importance placed on statistics, pattern recognition, machine learning and data mining. Researchers have demonstrated the interest in the methods for improving the performance of their forecasting results. Therefore, this study proposes a feature selection approach, which based on minimize entropy principle approach. Experimental results have shown that the proposed...
Hepatitis patients are those who need continuous special medical treatment to reduce mortality rate. Using clinical test findings data and machine learning technology such as Support Vector Machines (SVM), the classification and prediction of their life prognosis can be done. However, we cannot pledge that all the features values in the data are correlated to each other. Therefore, we incorporate...
Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. It is used to find an optimal subset to reduce computational cost, increase the classification accuracy and improve result comprehensibility. In this paper, a weighted distance learning approach is introduced to minimize Leaving-One-Out classification error using a gradient descent...
The Research of detection malware using machine learning method attracts much attention recent years. However, most of research focused on code analysis which is signature-based or analysis of system call sequence in Linux environment. Obviously, all methods have their strengths and weaknesses. In this paper, we concentrate on detection Trojan horse by operation system information in Windows environment...
This study investigates the characteristics of the Quantum-inspired Spiking Neural Network (QiSNN) feature selection and classification framework. The self-adapting nature of QiSNN due to the simultaneous optimization of network parameters and feature subsets represents a highly desirable characteristic in the context of machine learning and knowledge discovery. In this paper, the evolution of the...
Craters are important geographical features caused by the impacts of meteoroids. Craters have been widely studied because they contain crucial information about the age and geologic formations of planets. This paper discusses an automated crater-detection framework using knowledge discovery and data mining (KDD) process including sampling, feature selection and creation, and supervised learning methods...
In this work we tackle the problem of search personalization for on-line soft goods shopping. By learning what the user likes and what the user does not like, better search rankings and therefore a better overall shopping experience can be obtained. The first contribution of the work is in terms of feature selection: given the specific nature of the domain, we combine the traditional visual and text...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.