The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
How to mine user-interested path from Web-log is an important and challengeable research topic. On the analysis of the present algorithm's advantages and disadvantages, we propose a new algorithm for discovering such expected Web pages. Through computing the probability of the document which is recommended to the user, we can mine user preferred sub-paths. Accordingly, all the sub-paths are merged,...
An algorithm of reduct computation for feature selection is proposed in the paper, which is a discernibility matrix based method and aims at reducing the number of irrelevant and redundant features in data mining. The method used both significance information of attributes and information of discernibility matrix to define the necessity of heuristic feature selection. The advantage of the algorithm...
In this paper, the theory of natural immune system is first briefly introduced. Several representative artificial immune networks are next discussed. Their principles and learning algorithms are given here in details. Moreover, we demonstrate the applications of these artificial immune networks in the fields of data mining, pattern recognition, and optimization
Detection of outliers and identification of change points in a data stream are two very exciting topics in the area of data mining. This paper explores the relationship between these two issues, and presents a unifying method for dealing with both of them. This approach is based on a probabilistic model of time series whose parameters are updated adaptively. The forward and backward prediction errors...
In this paper, we explore a new problem of simultaneously mining diagnostic genes and specific phenotypes from microarray data using unsupervised method. A novel type of cluster called LC-Cluster is proposed to address this problem. The idea behind the solution is motivated by recent biological discovery and origins from current bicluster model or emerging pattern, but differs substantially from either...
A rule selection framework is proposed which classifies, selects, and filters out association rules based on the analysis of the rule structures. It was applied to real traffic accident data collected from local police stations. The rudimentary nature of the data required several passes of association rule mining to be performed, each with different sets of parameters, so that semantically interesting...
Homology-related querying on Bio-XML databases pose several problems, as most available exhaustive mining techniques do not incorporate the semantic relationships inherent to these data collections. This paper identifies an index-based approach to mining such data and explores the improvement achieved in the quality of query results by the application of genetic algorithms
On the basis of analyzing the deficiencies of traditional spatial data mining, a framework for spatial data mining with uncertainty has been founded. Four key problems have been analyzed, including uncertainty simulation of spatial data with Monte Carlo method, spatial autocorrelation measurement, discretization of continuous data based on neighbourhood EM algorithm and uncertainty assessment of association...
Aiming at difficulties of vibration fault diagnosis for turbo-generator sets, an intelligent data-mining system based on acquired data in SCADA systems is structured. The hard core of the system is a focusing quantization algorithm and a reduction algorithm. The focusing quantization algorithm put focus on the transition point from normal to abnormal state of variables, the resolution near the focus...
In this paper, a combination-tree algorithm is presented for mining frequent patterns based on inverted list. Compared with Apriori algorithm and FP-growth algorithm, our algorithm has better efficiency. Our algorithm insert items one by one with inverted list to build frequent tree, then transfer count between branches in order to make branches independent, our algorithm need only scan data set twice,...
In this paper, we present one entropy partition method and its ideal requirement for partition result, then apply this method in TCM (Traditional Chinese Medicine) data without understanding any expectation of the objectives. We propose the conception of N-class correlation in order to solve the computational problem successfully. The idea of RFS (relatives and friends set) and general filtering are...
Pattern-based clustering is widely applied in bioinformatics and biomedical Recently, mining high quality pattern-based clusters has become an important research direction. However, the existing methods were neither efficient in large data set nor precise at measuring the quality of clusters. These problems have greatly limited the methods' application in large data set. This paper proposes a new...
Episodes rules can describe and predict the behavior of the event sequences. The property of incremental frequent episodes mining is studied and the related lemmas and corollaries are presented, then a general incremental algorithm named IHE for mining frequent episodes is proposed. Moreover, it proposes and utilizes the window-hash-based technique to prune candidate episodes. The performance of the...
Efficiency has been concerned for several years in the research of association rules mining. In this paper, based on the improvement on the classical Apriori algorithm, a high-dimension oriented Apriori algorithm is proposed. Unlike existed Apriori improvements, our algorithm adopts a new method to reduce the redundant generation of sub-itemsets during pruning the candidate itemsets, which can obtain...
While most of the existing grid literatures only consider the resource pricing strategy, this paper focuses on optimizing grid resource combination, another important issue the grid service providers should address to maximize their profits. We develop an efficient algorithm for computing and maintaining all the frequent demand patterns and dynamically updating them with the incoming grid trade data...
Pattern recognition problems specifically for spectral data were developed. As an application, classification of four tea varieties based on near infrared spectra was taken by using the method. Factor analysis (FA) and artificial neural networks (ANN) were used for pattern recognition in this research. FA is a very effective data mining way; it was applied to enhance species features and reduce data...
To address the issues that user evaluation data is extremely sparse, the user-accessing matrix based on Web log mining is established, which takes the frequencies of user accessing, browsing time and the length of the pages into consideration. Furthermore, a novel collaborative filtering algorithm based on Web page rating prediction is proposed. This method predicts Web page ratings that users have...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.