The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Reducing the number of latent software defects is a development goal that is particularly applicable to high assurance software systems. For such systems, the software measurement and defect data is highly skewed toward the not-fault-prone program modules, i.e., the number of fault-prone modules is relatively very small. The skewed data problem, also known as class imbalance, poses a unique challenge...
It has been observed that classification in imbalanced data sets have drawn more attention to researchers in knowledge discovery and data mining fields. In such problems, almost all the samples are labeled as one class, while far fewer samples are labeled as the other class, which are usually more important. But traditional classifiers that try to pursue whole accurate performance over a full range...
The problem of choosing the best classification algorithm for a specific problem domain has been extensively researched. This issue was also the main motivation behind the ever increasing interest in ensemble methods since 1992. In this paper, we propose a new method for classifiers' fusion, which integrates cascade generalization and voting techniques. The proposed method utilizes two learning algorithms...
Nowadays the most active research in supervised learning includes an integration of several base classifiers into the combined classification system. Such systems are known under the names multiple classifiers, ensembles methods. This topic attracts an interest of machine learning researchers as multiple classifiers are often much more accurate than the component classifiers that make them up. In...
This paper presents the results of an explorative study on predicting aspects of playing behavior for the major commercial title Tomb Raider: Underworld (TRU). Various supervised learning algorithms are trained on a large-scale set of in-game player behavior data, to predict when a player will stop playing the TRU game and, if the player completes the game, how long will it take to do so. Results...
In machine learning classification, the classifier can be described by some rules, and the rules can be expressed by fuzzy granules corresponding to fuzzy concepts. In this paper we will introduce fuzzy information granulation to the process of building fuzzy classifier. Furthermore, we will present an optimized information granulation based machine learning classification algorithm. Experiments carried...
Classification on noisy data streams has recently become one of the most important topics in streaming data mining. In this paper, a Classification algorithm for mining Data Streams based on Mixture Models of C4.5 and NB is proposed called CDSMM. In this algorithm, C4.5 is used as the base classifiers, the hypothesis testing method is introduced for the detection of concept drifts, and a Naïve Bayes...
A new clustering classification approach based on fuzzy closeness relationship (FCR) is studied in this paper. As we know, fuzzy clustering classification is one of important and valid methods to knowledge discovery. One of problems in fuzzy clustering classification is to determine a certain fuzzy sample classification in given limited sample space. Another is its validity, that is to say, if the...
Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a ”flat” form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a two-phase hierarchical meta-classification...
Lately, many notorious financial distress and bankruptcy events occurred in the world economic. As we known, bankruptcy of Lehman Brothers Holdings Inc. (LEH) is the largest bankruptcy filing in U.S. history in 2008. These events have serious impacted on the socio-economic and investment in public wealth. Due to solve this dilemma, this research collected 68 listed companies as the raw data from Taiwan...
Most of the classifiers suffer from curse of dimensionality during classification of high dimensional image data. In this paper, we introduce a new supervised nonlinear dimensionality reduction (S-NLDR) algorithm called evolutionary strategy based supervised dimensionality reduction (ESSDR). The ESSDR method uses population based evolutionary strategy (ES) algorithm to find low dimensional embedded...
Classification Data Mining (DM) Techniques can be a very useful tool in detecting and identifying e-banking phishing websites. In this paper, we present a novel approach to overcome the difficulty and complexity in detecting and predicting e-banking phishing website. We proposed an intelligent resilient and effective model that is based on using association and classification Data Mining algorithms...
Book reviews are comments written by readers regarding their experiences about a particular book. Some reviews contain useful information and may help prospective buyers in making a purchase decision, while some are viewed as less helpful, such as, complaints about shipping delay. The review's content is the key to differentiating them. Presenting a methodology for evaluating the helpfulness of a...
The rapid development of the Internet brings a new problem, which is how to rapidly and effectively retrieve needed web resource from vast number of web pages. The progress of machine learning techniques shows a new direction of solving this problem. In this paper, intelligent crawling algorithm based on rough set is proposed. The algorithm use the hypertext features behavior in order to perform topic...
A patient-specific seizure prediction algorithm is proposed using a classifier to differentiate preictal from interictal ECoG signals. Spectral power of ECoG processed in four different fashions are used as features: raw, time-differential, space-differential, and time/space-differential ECoG. The features are classified using cost-sensitive support vector machines by the double cross-validation methodology...
This paper presents a novel approach to overcome the difficulty and complexity in detecting and predicting e-banking phishing website. We proposed an intelligent resilient and effective model that is based on using association and classification Data Mining algorithms. These algorithms were used to characterize and identify all the factors and rules in order to classify the phishing website and the...
Developing hardware, algorithms and protocols, as well as collecting data in sensor networks are all important challenges in building good systems. We describe a vertical system integration of a sensor node and a toolkit of machine learning algorithms. Based on a dataset that combines sensor data with additional introduced data we predict the number of persons in a closed space. We analyze the dataset...
Collaborative tagging system has become more and more popular and recently achieved widespread success due to flexibility and conceptual comprehensibility of tagging systems. Recommender system has the access to adopt tagging systems to achieve better performance. In this paper we consider that the items can be categorized into different classifications in which users show different interests. Here...
Microarray technology today has the ability of having the whole genome spotted on a single chip. It allows the biologist to inspect thousands of gene activities simultaneously. Machine learning approaches are suited and used to discovering the complex relationships between genes under controlled experimental conditions and classify microarray data by identifying a subset of informative genes embedded...
Feature selection and feature weight calculating are key preprocesses in text classification. A new feature selection approach based on average interaction gain (AIG) is presented and a new feature weight adjustment technique (WA) taking inter-class distribution and intra-class distribution into consideration is presented too. Then a new approach combining AIG with WA called AIG-WA is presented. In...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.