The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In data mining post-processing, rule selection using objective rule evaluation indices is one of a useful method to find out valuable knowledge from mined patterns. However, the relationship between an index value and experts' criteria has never been clarified. In this study, we have compared the accuracies of classification learning algorithms for datasets with randomized class distributions and...
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In many sensitive applications, users should be allowed to associate a measure of reliability to each prediction. In the case of batch systems,...
The reliability of an induced classifier can be affected by several factors including the data oriented factors and the algorithm oriented factors. In some cases, the reliability could also be affected by knowledge oriented factors. In this paper, we analyze three special cases to examine the reliability of the discovered knowledge. Our case study results show that (1) in the cases of mining from...
Research on streaming data classification has been mostly based on the assumption that data can be fully labelled. However, this is impractical. Firstly it is impossible to make a complete labelling before all data has arrived. Secondly it is generally very expensive to obtain fully labelled data by using man power. Thirdly user interests may change with time so the labels issued earlier may be inconsistent...
Sales prediction is an important problem for different companies involved in manufacturing, logistics, marketing, wholesaling and retailing. Food companies are more concerned with sales prediction of products having a short shelf-life and seasonal changes in demand. The demand may depend on many hidden contexts, not given explicitly in the form of predictive features. Even if some changes are known...
In this paper we present a new algorithm for semisupervised clustering. We assume to have a small set of labeled samples and we use it in a clustering algorithm to discover relevant patterns. We study how our algorithm works against two other semisupervised algorithms when the data are multimodal. Then, we study the case where the user is able to produce few samples for some classes but not for each...
The function of proteins in the living cells varies with respect to their localizations. Extracellular plant proteins are responsible for vital functions such as nutrition acquisition, protection from pathogens, communication with other soil organisms, etc. Hence, characterizing these proteins and distinguishing them from intracellular proteins is of high interest to biologists. Nonetheless, the small...
Matrix factorization (MF) based approaches have proven to be efficient for rating-based recommendation systems. In this work, we propose several matrix factorization approaches with improved prediction accuracy. We introduce a novel and fast (semi)-positive MF approach that approximates the features by using positive values for either users or items. We describe a momentum-based MF approach. A transductive...
Ordered information table is one of the most important research areas of granular computing. In this thesis, we introduce multiple decisions ordered information tables based on the concept of ordered information tables. Multiple decisions ordered information tables are used to describe the actual multiple decision attributes situation of reality. We study the process of rule extraction from multiple...
Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates...
This article introduces ARUBAS, a new framework to build associative classifiers. In contrast with many existing associative classifiers, it uses class association rules to transform the feature space and uses instance-based reasoning to classify new instances. The framework allows the researcher to use any association rule mining algorithm to produce the class association rules. Every aspect of the...
For multi-view learning, existing methods usually exploit originally provided features for classifier training, which ignore the latent correlation between different views. In this paper, semantic features integrating information from multiple views are extracted for pattern representation. Canonical correlation analysis is used to learn the representation of semantic spaces where semantic features...
The performance of user profiling models depends on both the predictive accuracy and the cost of incorrect predictions. In this paper we study whether including contextual information leads to a decrease in the misclassification cost. Several experimental analyses were done by varying the cost ratio, the market granularity and the granularity of context. The experimental results show that context...
This paper presents G-REX, a versatile data mining framework based on genetic programming. What differs G-REX from other GP frameworks is that it doesn't strive to be a general purpose framework. This allows G-REX to include more functionality specific to data mining like preprocessing, evaluation- and optimization methods, but also a multitude of predefined classification and regression models. Examples...
On-line data stream mining has attracted much research interest, but systems that can be used as a workbench for online mining have not been researched, since they pose many difficult research challenges. The proposed system addresses these challenges by an architecture based on three main technical advances, (i) introduction of new constructs and synoptic data structures whereby complex KDD queries...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.