The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Feature selection is the process of selecting a subset of relevant features from the larger set of collected features. As the amount of available data grows with technology, feature selection becomes a more important part of the system-design process. In real-world applications, there are several costs associated with the collection, processing, and storage of data. Given that these costs can vary...
Employee churn prediction which is closely related to customer churn prediction is a major issue of the companies. Despite the importance of the issue, there is few attention in the literature about. In this study, we applied well-known classification methods including, Decision Tree, Logistic Regression, SVM, KNN, Random Forest, and Naive Bayes methods on the HR data. Then, we analyze the results...
This paper presents a predictive model which to predict the trends of stock prices using Data Mining techniques. This research will allow the investor to make a more informed decision to buy and sell stocks, and in the most appropriate period. The predictive concept in this work implies learning historical price patterns, indicators, and behavior; and then predicting the future trends in one, five,...
This paper presents an application of educational data mining to predict undergraduate retention. The research provides valuable insight about data feature ranking, algorithm selections and validation methods based on unique types of data that come from educational settings. The data from a cohort of 972 students enrolled in 2008 at Embry-Riddle Aeronautical University (ERAU) were used to train and...
Feature selection is the process of choosing a subset of the available features or attributes from a certain dataset in order to render the process of building a predictive model more efficient and accurate. The selection of attributes is, in most of the times, done sequentially. In this paper we propose a new filtering strategy that selects the attributes in a composite way rather than sequential...
Fasting blood glucose (FBG) is an important indicator for human's health. Prediction for FBG is meaningful for finding and healing diseases, especially for diabetes mellitus. Based on four years' historical medical examination data, a prediction model of coming year's FBG is presented using traditional data mining techniques with a novel algorithm to estimate the FBG change probability and a proposed...
Since four decades, a sincere concern has aroused among managerial, professional, towards the satisfaction of teaching-learning objective in Academia. Huge span of time has already been spent revealing student's profile patterns using predictive modeling methods, however, very little effort is put up in identifying the causative features responsible for varied students' performances followed by decisive...
Classification of emotion from sentences requires the classifier to be trained on relevant features. This paper focuses on different features (a) Bag-of-Words (b) Part-of-Speech tags (c) Sentence Length and (d) Lexical Emotion Features. Extensive evaluation on variable feature length for classifying textual emotions is carried out to understand their role in model performance. Experiments depict that...
Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration bugs. Given a bug report, an approach that can identify whether the bug is a configuration bug could...
Research on the temporary staffing industry discusses different topics ranging from workplace safety to the internationalization of temporary labor. However, there is a lack of data mining studies concerning this topic. This paper meets this void and uses a financial dataset as input for the estimated models. Bagged decision trees were utilized to cope with the high dimensionality. Two bagged decision...
In this research we took an experiment of two feature selection methods - eta square and stepwise methods on two classification models - back propagation neural network (BPNN) and general regression neural network (GRNN) to study the effects on the correctness of firm bankruptcy classification. The correctness includes the average classification correctness and the power of bankruptcy classification...
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of ``normal'' training data points in a chosen...
The data mining and machine learning community is often faced with two key problems: working with imbalanced data and selecting the best features for machine learning. This paper presents a process involving a feature selection technique for selecting the important attributes and a data sampling technique for addressing class imbalance. The application domain of this study is software engineering,...
This paper presents a method for forecasting the change of intraday stock price by utilizing text mining news of stock. This method is based on text mining techniques coupled with rough sets theories and support vector machine classifier. The method can handle without difficulty unstructured news of Taiwan stock market through preprocessing, feature selection and mark. The method also extracts the...
Predicting conceptual change in scientific inquiry learning environment is not trivial due to the challenges that stemmed when eliciting a student's implicit properties. The challenges could be more complicated when such learning environment employs exploratory learning approach. One plausible approach to tackle the challenges is by employing data mining approach. In this study, 129 interaction logs...
In the traditional data-driven data mining process, there are huge gaps between the efficient algorithms and intelligent tools as well as the invalidity of knowledge, which is obtained by traditional data-driven data mining. Meanwhile, each data in the earth science field contains a solid physical meaning. If there is no corresponding domain knowledge involved in the mining process, the information...
In this paper, we study the learning impact of data sampling followed by attribute selection on the classification models built with binary class imbalanced data within the scenario of software quality engineering. We use a wrapper-based attribute ranking technique to select a subset of attributes, and the random undersampling technique (RUS) on the majority class to alleviate the negative effects...
Data mining is the use of algorithms to extract the information and patterns derived by the knowledge discovery in databases process. Classification maps data into predefined groups or classes. It is often referred to as supervised learning because the classes are determined before examining the data. In many data mining applications that address classification problems, feature and model selection...
Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set...
Prediction of the prosodic phrase boundary is a potent influence on the performance of speech recognition and voice synthesis systems. We propose a statistical approach using efficient learning features for the natural prediction of the Korean prosodic phrase boundary. These new features reflect factors that affect the generation of the prosodic phrase boundary better than existing learning features...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.