The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Sequential frequent itemsets detection is one of the core problems in data mining. In the current paper we propose a new methodology based on our previous work regarding the detection of all repeated patterns in a string. By analyzing big datasets from FIMI website of up to one million transactions we were able to detect not only the most frequent sequential itemsets but any sequential itemset occurred...
Lack of effective usage examples in API documents has been proven to be a great obstacle to API learning. To deal with this issue, several approaches have been proposed to automatically extract usage examples from client code or related web pages, which are unfortunately not available for newly released API libraries. In this paper, we propose a novel approach to mining API usage examples from test...
Distribution of data stream is always changed in the real world. This problem is usually defined as concept drift [1]. The state-of-the-art decision tree classification method CVFDT[2] can solve the concept drift problem well, but the efficiency is debased because of its general method of handling instances in CVFDT without considering the types of concept drift. In this paper, an algorithm called...
The main difference of the associative classification algorithms is how to mine frequent item sets, analyze the rules exported and use for classification. This paper presents an associative classification algorithm based on Trie-tree that named CARPT, which remove the frequent items that cannot generate frequent rules directly by adding the count of class labels. And we compress the storage of database...
Spatial Co-location patterns are similar to association rules but explore more relying spatial auto-correlation. They represent subsets of Boolean spatial features whose instances are often located in close geographic proximity. Existing co-location patterns mining researches only concern the spatial attributes, and few of them can handle the huge amount of non-spatial attributes in spatial datasets...
The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Data mining Engines). FREERIDE is based upon the...
Concept lattice is accurate and complete in knowledge representation and is an effective tool for data analysis and knowledge discovery. This paper focuses on incremental computation of intent reduction of concepts. By theoretical analysis of characteristic change of intent reduction of lattice nodes during incremental construction of concept lattice, it advances an incremental algorithm to compute...
In order to improve efficiency of excavation in relational database with multi-dimensional association rules, this paper analyzed Apriori algorithm and BUC algorithm based on practice. Then an improved Apriori algorithm-DGP algorithm which based on the multidimensional association rule was presented, it has more efficient and it will be used in the relational database. At last it was applied for analyzing...
Abstract-By analyzing the process of classification and MapReduce computing paradigms, it is found that the parallel and distributed computing model in MapReduce is appropriate for constructing classifier model. This paper presents a MapReduce algorithm for parallel and distributed classification, aiming to reduce the computational time in training process on large scale documents. Our experiment...
An approach of sensor subset selection is considered one of significant issues in machine olfaction. Basically, each sensor should provide different selectivity profiles over the range of target odor application so that a unique odor pattern is produced from each sensor in the array. However, some or most of the features obtained from an array of sensors in practice are redundant and irrelevant due...
This paper presents a triangulation algorithm for the general plane polygon, this algorithm does not consider the polygon's concave or convex and its vertices are clockwise or counter clockwise. It first makes the elimination marks for the diagonals outside of the polygon, then determines the diagonals which intersect with the polygon and makes the elimination marks. In order to avoid the long and...
In recent years, mining of sequential patterns has been studied extensively in various domains. Most of the existing algorithms find patterns in transactional databases by scanning the records whether they contain patterns or not. This paper proposes a novel algorithm to mine closed sequential patterns using an inverted matrix and prefix based sequence element matrix. Inverted matrix minimizes the...
Text clustering is a hot and essential topic in data mining and information retrieval. This paper proposed a KP-FCM clustering method, which used the key phrases as text features and applied the Fuzzy c-means (FCM) as clustering algorithm. In this method, key phrases were extracted by an algorithm based on suffix array. Experimental results on two standard text clustering benchmark corpuses, OHSUMED...
There are large scientific data archives manage and store huge quantities of data, deal with this data throughout its life cycle, and focus on particular scientific domains. Metadata can be used for assisting the information retrieval. Using metadata to represent the file system also reduces the processing required to handle operations. While the number of metadata file is daily incremental with the...
One of the core technologies in smart antenna (SA) is DOA estimation. The current DOA estimation methods can be classified into three basic categories: spectrum searching algorithms, subspace algorithms and algorithms for best performance. All of these three categories have some limitations and can not be applied in CDMA system directly. This paper proposes a new simple and practical method for DOA...
An algorithm for computing the convex hull of scattered plane point set through the extreme points on the boundary of plane is proposed. According to the extreme points, the plane point set is divided into five zones. The four zones on the boundary contain all convex vertexes. By computing extreme points of subsets in the four marginal zones, a polygon that contains all convex vertexes is obtained...
In this study, we proposed the method of automatics searching predefined events location in digital images of old paper-tape data recording, which in essence is indirect processing. The main idea of proposed algorithm is isomorphic transformation of paper-tape digital images to the time-serial data. The time-serial data obtained by this transformation is clustered and classified to obtain the positions...
As the information on the Internet increases dramatically, the Web search engine has become an indispensable tool to search and locate the required information. Web snippets clustering can classify the search results and help users to narrow the search scope. This paper presents an online clustering algorithm for Chinese web snippets using common substrings. The algorithm firstly preprocesses the...
Based on a kind of layered policy representation framework, several types of policy conflict were proposed through the research on the characteristics of each layer of the policy and the relationship of policies, then an in-depth analysis on these types of policy conflict were discussed, moreover, a variety of relevant detection and elimination algorithms of policy conflict were put forward. Finally,...
The Jenks natural breaks algorithm is a standard method for dividing a dataset into a certain number of homogenous classes. The algorithm is commonly used in geographic information systems (GIS) applications. One major drawback to the use of Jenks in this context is that the number of desired classes must be indicated before the algorithm is applied to the dataset. Without a mechanism for determining...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.