The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We compare the performance of three parallel clustering algorithms: Canopy, K-means and fuzzy K-means in real cluster environments. By constructing cluster platform of different scale, we compare these algorithms from three metrics: run time, speedup and sizeup. Experimental results show that: (1) if both the data set and the number of nodes in the cluster are the same, both the runtime and the sizeup...
This paper addresses the problem of estimating the correct number of components in a Gaussian mixture given a sample data set. In particular, an extension of Gaussian-means (G-means) and projected Gaussian-means (PG-means) algorithms is proposed. All these methods are based on one-dimensional statistical hypothesis test. G-means and PG-means are wrapper algorithms of the k-means and expectation-maximization...
In this paper, we propose a new ant based clustering algorithm. The algorithm takes inspiration from the sound communication properties of real ants. Artificial ants communicate directly with each others in order to merge similar group of objects. The proposed algorithm was tested and evaluated. The obtained results are very encouraging in comparison with the famous k-means and some ant based clustering...
K-means is a clustering algorithm that is widely applied in many fields, including pattern classification and multimedia analysis. Due to real-time requirements and computational-cost constraints in embedded systems, it is necessary to accelerate k-means algorithm by hardware implementations in SoC environments, where the bandwidth of the system bus is strictly limited. In this paper, a bandwidth...
Content based video indexing and retrieval traces back to the elementary video structures, such as a table of contents. Thus, algorithms for video partitioning have become crucial with the unremitting growth in the prevalent digital video technology. This demands for a tool which would break down the video into smaller and manageable units called shots. In this paper, a shot boundary detection technique...
In this article, a distributed clustering technique, that is suitable for dealing with large data sets, is presented. This algorithm is actually a modified version of the very common k-means algorithm with suitable changes for making it executable in a distributed environment. For large input size, the running time complexity of k-means algorithm is very high and is measured as O(TKN), where K is...
Image clustering and categorization is a means for high-level description of image content. In the field of content-based image retrieval (CBIR), the analysis of gray scale images has got very much importance because of its immense application starting from satellite images to medical images. But the analysis of an image with such number of gray shades becomes very complex, so, for simplicity we cluster...
We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdf). We show that the UK-means algorithm, which generalises the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (ED) between objects and cluster representatives. For arbitrary pdf's,...
The conventional clustering algorithms in data mining like k-means algorithm have difficulties in handling the challenges posed by the collection of natural data which is often vague and uncertain. The modeling of imprecise and qualitative knowledge, as well as handling of uncertainty at various stages is possible through the use of fuzzy sets. Fuzzy logic is capable of supporting to a reasonable...
Seismic exploration plays an important role in petroleum industry. It is widely admitted that there are a lot of limitations of conventional data analysis ways in oil and gas industry. Traditional methods in petroleum engineering are knowledge-driven and often neglect some underlying factors. On the contrary, data mining is to deal with mass of data and never overlook any important phenomena. Due...
This paper presents a new approach for inspection of printed matter flaws based on K-mean clustering (KM) and principal component analysis (PCA). PCA is a method that can transform the original data that contains more vectors and some different correlative relationships between these vectors into a new one that contains fewer vectors and disrelated relationships between these vectors, while keeping...
Fast and high-quality document clustering algorithms play an important role towards the goal of organizing large amounts of information into a small number of meaningful clusters. Traditional clustering algorithms will search only a small sub-set of all possible clustering and consequently, there is no guarantee that the solution found will be optimal. This paper presents Procreant PSO (PPSO) algorithm...
Document clustering is the process to partition a set of unlabelled documents into some clusters. To analyze the documents efficiently and effectively, it is expected that all documents in each cluster have some shared concept. The shared concept is most conveniently represented using some key terms. Many methods have been studied for selecting important key terms. However, most of them belong to...
The web pages which are from different sources of network have different form and style. So it is difficult to obtain optimal model by learning from hybrid training pages. In order to improve the accuracy of information extraction, a new approach based on clustering generalized hidden Markov model was proposed. In this approach, the clustering algorithm was applied to web information extraction. The...
Several problems are existed when K-NN (K- nearest neighbor) method is used to classify the Holter waveforms: the data scale is too large; the classification algorithm needs training samples; the K-NN is a linear classification method. Therefore, this paper proposes a new K-NN algorithm; the algorithm is based on kernel function. Through this change, classification is transformed from linear to non-linear...
In this paper we compare and evaluate the use of the following methods: Ant Colony inspired Clustering, Ant Colony inspired method for Decision Tree generation, Radial Basis Function Neural Networks with different learning algorithms and compare them to classical approaches, such as hierarchical clustering and k-means. We have evaluated the methods on the annotated MIT-BIH database. In the case of...
It is here presented a new method for clustering that uses very limited amount of labeled data, employees two pairwise rules, namely must link and cannot link and a single wise one, cannot cluster. It is demonstrated that the incorporation of these rules in the intelligent k-means algorithm may increase the accuracy of results, this is proven with experiments where the real number of clusters in the...
This paper explores an image fusion algorithm based on self-adaptive fuzzy clustering algorithm. The clustering method combined nearest neighbor clustering method with k-means clustering method (NNKM) is adopted in pixel classification. The membership of every image pixel to each cluster center is introduced. And the fused image membership obtained by maximum rule is adopted as the weighting coefficient...
Soft subspace clustering algorithms receive wide interests recently, because of their scalable and flexible ability at handling high dimensional sparse data. A disadvantage of those existing algorithms is their clustering results are affected by goodness of initial centroid selected by random initial method greatly. In this paper, we propose a heuristically weighting K-means algorithm and a corresponding...
In this paper we propose a clustering method based on combination of the particle swarm optimization (PSO) and the k-mean algorithm. PSO algorithm was showed to successfully converge during the initial stages of a global search, but around global optimum, the search process will become very slow. On the contrary, k-means algorithm can achieve faster convergence to optimum solution. At the same time,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.