The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Advances in computing and communication has resulted in very large scale distributed environments in recent years. They are capable of storing large volumes of data and often have multiple compute nodes. However, the inherent heterogeneity of data components, the dynamic nature of distributed systems, the need for information synchronization and data fusion over a network and security and access control...
Matrix factorization (MF) based approaches have proven to be efficient for rating-based recommendation systems. In this work, we propose several matrix factorization approaches with improved prediction accuracy. We introduce a novel and fast (semi)-positive MF approach that approximates the features by using positive values for either users or items. We describe a momentum-based MF approach. A transductive...
For many data mining applications, it is necessary to develop algorithms that use unlabeled data to improve the accuracy of the supervised learning. Co-Training is a popular semi-supervised learning algorithm. It assumes that each example is represented by two or more redundantly sufficient sets of features (views) and these views are independent given the class. However, these assumptions are not...
Ordered information table is one of the most important research areas of granular computing. In this thesis, we introduce multiple decisions ordered information tables based on the concept of ordered information tables. Multiple decisions ordered information tables are used to describe the actual multiple decision attributes situation of reality. We study the process of rule extraction from multiple...
Sequential pattern mining has become more and more popular in recent years due to its wide applications and the fact that it can find more information than association rules. Two famous algorithms in sequential pattern mining are AprioriAll and PrefixSpan. These two algorithms not only need to scan a database or projected-databases many times, but also require setting a minimal support threshold to...
An association rule (AR) is a common knowledge model in data mining that describes an implicative co-occurring relationship between two disjoint sets of binary-valued transaction database attributes (items), expressed in the form of an "antecedent rArr consequent" rule. A variant of the AR is the weighted association rule (WAR). With regard to a marketing context, this paper introduces a...
Relations of logical calculi of association rules to measures of interestingness of association rules are studied. Logical calculi of association rules, 4ft-quantifiers and important classes of association rules are briefly introduced. New 4ft-quantifiers and association rules are defined by applications of suitable thresholds to several known measures of interestingness. It is proved that some of...
The theoretical relationship between association rules and machine learning techniques needs to be studied in more depth. This article studies the use of clustering as a model for association rule mining. The clustering model is exploited to bound and estimate association rule support and confidence. We first study the efficient computation of the clustering model with K-means; we show the sufficient...
The ultimate goal of knowledge discovery (KD) is to extract sets of patterns leading to useful knowledge for obtaining user desirable outcomes. The key characteristics of knowledge usefulness is that these patterns are actionable. In the last decade, KD algorithms such as mining for association rules, clustering, and classification rules, have made a tremendous progress and have been demonstrated...
In data mining problems, data is usually provided in the form of data tables. To represent knowledge discovered from data tables, decision logic (DL) is proposed in rough set theory. While DL is an instance of propositional logic, we can also describe data tables by other logical formalisms. In this paper, we use a kind of many-sorted logic, called attribute value-sorted logic, to study association...
In empirical finance, the increase or decrease in the number of stock buy/sell orders is aroused by the information asymmetry, which eventually affects the change of the stock price. To monitor the change in the stock order flow, we propose a multilayer change-point detection algorithm which makes use of the multi-resolution property of wavelet transformation. We first detect the change-points in...
This paper shows the meaning of Pearson residuals as an indicator of statistical independence. While information granules of statistical independence of two variables can be viewed as determinants of 2times2-submatrices, those of three variables consist of several combinations of linear equations which will become residuals for odds ratio (outer products) when they are equal to 0. Interestingly, the...
There are several algorithms proposed for maintaining the sequential patterns as records are inserted. In addition to record insertion, the pattern maintenance for record modification is also very important in the real-applications. In the past, we have proposed the fast updated sequential pattern tree (called FUSP tree) structure for handling record insertion. In this paper, we attempt to handle...
Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally,...
In this paper we introduced an alternative view of text mining and we review several alternative views proposed by different authors. We propose a classification of text mining techniques into two main groups: techniques based on inductive inference, that we call text data mining (TDM, comprising most of the existing proposals in the literature), and techniques based on deductive or abductive inference,...
If we can estimate the accuracy of our observations then we can estimate the true and false positive rates over a series of samples in high dimensional data mining problems. To date such issues have been largely neglected and previously no algorithm has been provided to facilitate the computations involved. In high dimensional data mining tasks, increasing sparsity leads to decreasing true positive...
Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates...
This article introduces ARUBAS, a new framework to build associative classifiers. In contrast with many existing associative classifiers, it uses class association rules to transform the feature space and uses instance-based reasoning to classify new instances. The framework allows the researcher to use any association rule mining algorithm to produce the class association rules. Every aspect of the...
Learning classifier systems (LCS) are machine learning systems designed to work for both multi-step and single-step decision tasks. The latter case presents an interesting,though not widely studied, challenge for such algorithms,especially when they are applied to real-world data mining problems. The present investigation departs from the popular approach of applying accuracy-based LCS to data mining...
In this paper a new algorithm, called CStar, for document clustering is presented. This algorithm improves recently developed algorithms like generalized star (GStar) and ACONS algorithms, originally proposed for reducing some drawbacks presented in previous Star-like algorithms.The CStar algorithm uses the condensed star-shaped sub-graph concept defined by ACONS, but defines a new heuristic that...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.