The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data mining concerns theories, methodologies, and in particular, computer systems for knowledge extraction or mining from large amounts of data. Association rule mining is a general purpose rule discovery scheme. It has been widely used for discovering rules in medical applications. The diagnosis of diseases is a significant and tedious task in medicine. The detection of heart disease from various...
A clustering problem with balancing constraints is studied in this paper, which means that the sample number in each cluster has to be at least pre-given value. A modified k-means clustering algorithm is proposed, which adopt the proposed heuristic cluster assignment algorithm to deal with the balancing constraints. Numerical computation shows that the proposed algorithm can deal with the balancing...
Living in the modern technology dependent world, we heavily rely on electronically stored data and information, to come up with sound and timely decisions. Considering the entire information technology world, there exists an unimaginable volume of data which contains a lot of information which is relevant to various kinds of fields. But the problem emerges when we are interested to find out about...
The traditional k-means algorithm has sensitivity to the initial start center. To solve this problem, this paper proposed a new method to find the initial center and improve the sensitivity to the initial centers of k-means algorithm. The algorithm first computes the density of the area where the data object belongs to; then it finds k data objects, which are belong to high density area, as the initial...
Finding similar crime case subsets is an important task for intelligence analysts in crime investigation. It can not only provide multiple clues to solve crimes but also improve efficiency to catch the criminals. However, the conventional approach by querying specific attributes in relational databases has two defects: first, it is relatively of poor efficiency when a lot of incidents have to be handled;...
Previous studies have focused on serveral aspects of CRM (Customer Relationship Management). However, there is a lack of research that focuses on the customer segmentation of shipping enterprises using data mining. Data mining technology can be used to in modern CRM to greatly enhance it function and efficiency. Based on the technologies of clustering and classification in data mining, this paper...
In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and...
Although fuzzy k-modes algorithm has removed the numeric-only limitation of the k-means algorithm, that each attribute of the centroid with a single category value and the use of a simple distance measure will compromise its precision, and therefore prone to falling into local optima. In this paper, an extended fuzzy k-means(xFKM) algorithm for clustering categorical valued data is presented, in which...
In this paper, a new scalability of hybrid fuzzy clustering algorithm that incorporates the Fuzzy C-means into the Quantum-behaved Particle Swarm Optimization algorithm is proposed. The QPSO has less parameters and higher convergent capability of the global optimizing than Particle Swarm Optimization algorithm. So the iteration algorithm is replaced by the new hybrid algorithm based on the gradient...
In recent years, feature extraction methods make an achievement in pattern recognition and computer vision. It extracts not only useful feature for classification, but also reduces the dimension of pattern samples. In this paper, we propose orthogonal supervised spectral discriminant analysis (OSSDA) which motivated by marginal fisher analysis (MFA) and spectral clustering. It put different weights...
In this paper we present a novel clustering analysis method based on the Most Similar Relation Diagram (MSRD). MSRD is a diagram in which each datum of a dataset is linked to its most similar data. By cutting off some links in the diagram a certain number of clusters are formed. A compare of the MSRD method with hierarchical method is implemented. Clustering experiences using MSRD were done and the...
Clustering may be named as the first clustering technique addressed by the research community since 1960s. However, as databases continue to grow in size, numerous research studies have been undertaken to develop more efficient clustering algorithms and to improve the performance of existing ones. This paper demonstrates a general optimization technique applicable to clustering algorithms with a need...
An algorithm, TBCClustering, is presented in the paper for clustering GML documents using maximal frequent induced subtree patterns. TBCClustering mines the maximal frequent induced subtrees by using the structural information of GML documents, it can get the best minimum support automatically, and then chooses a set of subtree patterns to form the optimistic clustering features. Finally it uses CLOPE...
Cognitive maps, one of the hot topic in the research of computational intelligence, have been widely used in knowledge representation and decision-making. In mining of cognitive maps on the basis of data resources, outlier data seriously affect the accuracy of cognitive maps. Therefore, this paper, based on the analysis of traditional ones, proposes a new outlier data detection algorithm. The algorithm...
Text Categorization (TC) is an important component in many information organization and information management tasks. In many TC applications, the case-base grows at a fast rate and this causes inefficiency in the case retrieval process. Using Case-Base Maintenance learning via the GC (Generalization Capability) algorithm, which can reduce the case number into KNN algorithm, can improve efficiency...
Literature-based discovery is linking two or more literature concepts that have heretofore not been linked (i.e., disjoint), in order to produce novel, interesting, plausible, and intelligible knowledge. Cluster analysis is the core of literature-based discovery. This paper proposes an improved fuzzy c means (FCM) algorithm based on the analysis of existing clustering analysis of literature-based...
This paper describes a model that discovers association rules from a medical database to help doctors treat and diagnose a group of patients who show similar prehistoric medical symptoms. The proposed data mining procedure consists of two modules. The first is a clustering module that is based on a neural network, Adaptive Resonance Theory 2 (ART2), which performs affinity grouping tasks on a large...
This paper proposes a new method to cluster law texts based on referential relation of laws. We extract law entities (an entity represents a law) and their referential relation from law texts. Then SimRank algorithm is applied to calculate law entity's similarity through referential relation and law clustering is carried out based on the SimRank similarity. This is the first time to apply SimRank...
Privacy preserving data mining (PPDM) is a novel research direction to preserve privacy for sensitive knowledge from disclosure. Many of the researchers in this area have recently made effort to preserve privacy for sensitive association rules in statistical database. In this paper, we propose a heuristic algorithm named DSRRC (Decrease Support of R.H.S. item of Rule Clusters), which provides privacy...
Data clustering is one of the powerful techniques for the knowledge discovery from data. In this paper, a novel approach for hierarchical clustering has been proposed over non-binary search space. Besides the agglomerative methods, the proposed algorithm has considered the Strength of Presence associated with each transaction, to yield quality clusters which are again more close to the real life situation...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.