The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The cluster analysis deals with the problems of organization of a collection of data objects into clusters based on similarity. It is also known as the unsupervised classification of objects and has found many applications in different areas. An important component of a clustering algorithm is the distance measure which is used to find the similarity between data objects. K-means is one of the most...
This paper presents a data clustering algorithm based on the natural behaviors of social insects in multiple colonies and multiple food sources concept; agents from each colony take a food back to their colony aimed to group the food. The proposed algorithm is a distributed data clustering algorithm based on multiple swarm-like agent colonies. Its advantages are a distributed data clustering and heterogeneous...
In this paper, a new clustering model called generalized possibilistic c-means (GPCM) is proposed, and an efficient global optimization technique-differential evolution algorithm is used to optimize the proposed model. GPCM modifies possibilistic c-means (PCM) by limiting each cluster center in a fixed feasible region respectively. The feasible region is determined by the fuzzy c-means clustering...
In this article, a distributed clustering technique, that is suitable for dealing with large data sets, is presented. This algorithm is actually a modified version of the very common k-means algorithm with suitable changes for making it executable in a distributed environment. For large input size, the running time complexity of k-means algorithm is very high and is measured as O(TKN), where K is...
Point symmetry-based clustering is an important unsupervised learning tool for recognizing symmetrical convex or non-convex shaped clusters, even in the microarray datasets. To enable fast clustering of this large data, in this article, a distributed space and time-efficient scalable parallel approach for point symmetry-based K-means algorithm has been proposed. A natural basis for analyzing gene...
In this paper we have developed a connectivity based cluster validity index. This validity index is able to detect the number of clusters automatically from data sets having well separated clusters of any shape, size or convexity. The proposed cluster validity index, connect-index, uses the concept of relative neighborhood graph for measuring the amount of "connectedness" of a particular...
In this article a new ant clustering algorithm based on case based reasoning (CBR) is presented. Every ant has a case base which is updated iteratively by the process of CBR. The ant which is successful in dropping an item becomes an expert and can use its knowledge for future picked up items. Also expert ants are capable of cooperating to share their knowledge for even better clustering. Our simulation...
Clustering algorithms partition data sets into groups of objects such that the pairwise similarity between objects within the same cluster is higher than those assigned to different clusters. Defining a similarity measure becomes challenging in the presence of categorical data and affects the quality and meaningfulness of the clusters formed. Furthermore, the curse of dimensionality diminishes the...
We present an online adaptive clustering algorithm in a decision tree framework which has an adaptive tree and a code formation layer. The code formation layer stores the representative codes of the clusters and the tree adapts the separating hyperplanes between the clusters. The membership of a sample in a cluster is decided by the tree and the tree parameters are guided by stored codes. The model...
Proposed a novel fuzzy cluster algorithm-AWFCM, aiming at large miss-clustering and invalidation in the fuzzy C-means algorithm when has noises and uneven samples situation. This new algorithm defined a new distance in new metric space and introduced weight matrix based on sample dots' density. New definition of distance can efficiently restrain the error range of clustering centers for samples with...
In semi-supervised clustering, domain knowledge can be converted to constraints and used to guide the clustering. In this paper we propose a feature selection algorithm for semi-supervised clustering. In our method, features are conditionally independent. Feature saliency is first computed in unsupervised clustering using the expectation maximization model. Then, it is refined in the tuning step to...
In this paper we propose a new partial closure-based constrained clustering algorithm. We introduce closures into the partial constrained clustering and we propose a new measurement to order the importance of the constrained closures. Experiments on public datasets demonstrate the advantages of our algorithm over the standard Kmeans and two state-of-the-art constrained clustering algorithms.
In recent years there has been a growing interest in clustering uncertain data. In contrast to traditional, "sharp" data representation models, uncertain data objects can be represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In this context, the focus has been mainly on partitional and density-based approaches, whereas hierarchical clustering...
A novel sample based clustering technique has been developed in this paper. Since traditional k-means algorithm is very time consuming for large disk resident data, sample based, out-of-core clustering techniques have gained high popularity recently. We have used the concept of elimination of measurement errors by averaging over a number of samples. Here, samples or original data set are chosen randomly,...
One of the aspects of a clustering algorithm that should be considered for choosing an appropriate algorithm in an unsupervised learning task is stability. A clustering algorithm is stable (on a dataset) if it results in the same clustering as it performed on the whole dataset, when actually performs on a (sub)sample of the dataset. In this paper, we report the results of an empirical study on the...
A non-metric distance measure for similarity estimation based on the characteristic of differences is presented. This kind of distance is implemented in the well-known k-means clustering algorithm. To demonstrate the effectiveness of the distance we proposed, the performance of this kind of distance and the Euclidean and Manhattan distances were compared by clustering Iris dataset from the UCI repository...
According the clustering principles in data analysis, a clustering algorithm based on artificial immune system is proposed in this paper. This algorithm based on the immune mechanism of the capture of antigen by the antibody. The datum that need to be clustered are viewed as antigens, and the cluster centers are viewed as the antibodies in the immune system. The clustering is effectively the process...
After analyzing the disadvantages of the classical K-means clustering algorithm, this paper combines the core idea of K-means clustering method with PSO algorithm and proposes a new clustering method which is called clustering algorithm based on particle swarm optimization algorithm. It uses the global optimization of PSO algorithm to make up the shortage of the clustering method. The algorithm is...
Self-organizing map (SOM) has been recognized as a powerful tool in cluster analysis. This paper presents a fuzzy SOM algorithm for mixed numeric and categorical data which integrates fuzzy set theory in model exploration through a fuzzy projection instead of crisp projection. In addition, a hybrid clustering approach is proposed combining SOMs with partitive clustering algorithms for the sake of...
Clustering is one of main technical of data mining, by a kind of non-teacher supervises recognition pattern. Despite its popularity for general clustering, K-means suffers two major shortcomings: the number of clusters K has to be supplied by the user and the search is prone to local minima. This article unifies particle swarm optimization (PSO) algorithm and Bayesian information criterion (BIC),...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.