The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is an exploratory data analysis technique, which categorizes the dataset into some groups. These groups are formed in a way so that items which have similar features live in same group and those have dissimilar features remain in other. There are many clustering algorithm available. Different kinds of algorithms are best used for different kinds of data. K-means is most used clustering...
This paper presents Weight Computing in Competitive K-Means Algorithm which is derived from Improved K-means method and subspace clustering. By adding weights to the objective function, the contributions from each feature of each clustering could simultaneously minimize the separations within clusters and maximize the separation between clusters. The experiments described in this paper confirm good...
In this paper, Structure and properties of neural networks with quadratic junction are presented. Unsupervised learning rules about the neural networks are given. Using this kind of neural networks, an ART-based hierarchical clustering algorithm is suggested. The algorithm can determine the number of clusters and clustering data. The time and space complexity of the algorithm are discussed. A 2-D...
Most web text clustering is based on the space vector text representation model. This results in a high dimension in the terms; and it leads to an increase in time complexity and a loss of text semantics due to the fact that the semantic relationship of the terms is not considered. In this paper, a new approach is taken where a concept lattice is generated with text treated as object and terms of...
This paper introduces a new description-centric algorithm for web document clustering based on Memetic Algorithms with Niching Methods, Term-Document Matrix and Bayesian Information Criterion. The algorithm defines the number of clusters automatically. The Memetic Algorithm provides a combined global and local strategy for a search in the solution space and the Niching methods to promote diversity...
Clustering is a form of unsupervised classification that aims at grouping data points based on similarity. In this paper, we propose a new partitional clustering algorithm based on the notion of `contribution of a data point'. We apply the algorithm to content-based image retrieval and compare its performance with that of the k-means clustering algorithm. Unlike the k-means algorithm, our algorithm...
Nowadays, clustering algorithms are widely used in the commercial field, such as customer analysis, and this application has achieved good effect. K-means algorithm is by far the most commonly used method for clustering. Although, the time consumption is fairly high when faced with lager-scale data. In this paper, we improved the K-means algorithm. Our improvement is based on the triangle inequality...
Clustering analysis method is one of the main analytical methods in data mining, the method of clustering algorithm will influence the clustering results directly. This paper discusses the standard k-means clustering algorithm and analyzes the shortcomings of standard k-means algorithm, such as the k-means clustering algorithm has to calculate the distance between each data object and all cluster...
Most of the effort in the semi-supervised clustering literature was devoted to variations of the K-means algorithm. In this paper we show how background knowledge can be used to bias a partitional density-based clustering algorithm. Our work describes how labeled objects can be used to help the algorithm detecting suitable density parameters for the algorithm to extract density-based clusters in specific...
The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RApid Clustering using k-means),...
Traditional k-means algorithm cannot get high clustering precise rate, and easily be affected by clustering center random initialized and isolated points, but the algorithm is simple with low time complexity, and can process the big data set quickly. This paper proposes an improved k-means algorithm named PKM. PKM is based on similarity degree among data points made by cumulated K-means, and get the...
K-means algorithm is one of the most popular clustering algorithms. However, it is sensitive to initialized partition and the circular dataset. To attack this problem, this paper introduced an improved k-means algorithm based on multiple feature points. The algorithm selects a number of feature points as cluster centroids unlike the traditional algorithm which only uses one centroid. In addition,...
Edge detection is a problem of fundamental importance in image analysis. Many approaches for edge detection have already revealed more are waiting to be. But edge detection using K-means algorithm is the most heuristic and unique approach. In this paper, we have proposed an algorithmic technique to detect the edge of any kind of true gray scale images considering the artificial features of the image...
Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.