The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Discovering clusters of varyingly shapes, sizes and densities in a data set is still a challenging problem for density-based algorithms. Recently presented approaches either require the input parameters involving the information about the structure of the data set, or are restricted to two-dimensional data. In this paper, we present a density-based clustering algorithm, which uses the fuzzy proximity...
We present a framework for unsupervised image categorization in which images containing specific objects are taken as vertices in a hypergraph and the task of image clustering is formulated as the problem of hypergraph partition. First, a novel method is proposed to select the region of interest (ROI) of each image, and then hyperedges are constructed based on shape and appearance features extracted...
Classical validity indices are limited to clusters of specific geometrical shapes, and do not allow for discovering the natural cluster structure in data. Moving from the traditional coherent gene expression clustering to exploring the connectivity of gene expression patterns demands the use of more efficient validity indices. In this work, the application of a novel validity measure to gene expression...
In traditional grid clustering algorithms, the cluster results are just consisted of dense grids so that the clustering quality is low, while these algorithms are unable to cluster the multi-density datasets. In this paper, we propose a clustering algorithm based on grid and boundary over multi-density datasets. In order to describe the data distribution, boundary grid is introduced and checked by...
Some of the major challenges in current clustering applications include: some data sets are so huge that it is difficult to load the entire data sets into memory for clustering, the data sets are often distributed over different locations for various reasons, which makes it impossible to process them centrally, and when lacking prior knowledge of the unknown data sets, it is troublesome to choose...
To overcome the problems of Euclidean distance based clustering algorithms, an efficient algorithm CES is proposed. A distance metric derived from the infinite norm is introduced to measure similarities between objects, through the distance metric, the neighbor searching is converted to the intersection of projection sets searching, which speed up the clustering processing. An efficient neighbor searching...
In this paper we introduce a compactness based clustering algorithm. The compactness of a data class is measured by comparing the inter-subset and intra-subset distances. The class compactness of a subset is defined as the ratio of the two distances. A subset is called an isolated cluster (or icluster) if its class compactness is greater than 1. All iclusters make a containment tree. We introduce...
Many existing clustering algorithms use a single prototype to represent a cluster. However sometimes it is very difficult to find a suitable prototype for representing a cluster with an arbitrary shape. One possible solution is to employ multi-prototype instead. In this paper, we propose a minimum spanning tree (MST) based multi-prototype clustering algorithm. It is a split and merge scheme. In the...
R-tree is widely used in spatial database as a spatial access method. The node-split algorithm is the key sub-algorithm to generate R-tree. In traditional methods, the one-to-two split mode is applied. However, this leads to uneven node-shape. A brand-new node-split method is put forward. In this method, the 2-to-3 split mode is utilized based on spatial clustering principle, which can guarantee more...
Spatial data mining is the process of identifying or extracting efficient, novel, potentially useful and ultimately understandable patterns from the spatial data set, the spatial clustering analysis is one of the most important research directions in spatial data mining. Clustering criterion implied in massive data can be discovered by spatial clustering analysis method which can be used to explore...
K-means clustering is sensitive to starting points and its time cost is expensive for large scale of data, such as audio. Sampling approach is widely applied to find “better” starting points for speeding up the clustering converging procedure. However, how to choose a reasonable sampling-rate remains a problem. In this paper, we reported our initial exploration of locating reasonable sampling-rates...
Clustering is a hot research field in data mining. There are so many methods or algorithms designed for different type data set on which data analysis action operates. Local Agglomerative Characteristic (LAC) based Algorithm, in this paper, is presented for data clustering, which can handle clusters of different size, shapes, and densities, can work well on different distributed and natural variant...
Content-based image retrieval relies on the use of efficient and effective image descriptors. One of the most important components of an image descriptor is concerned with the distance function used to measure how similar two images are. This paper presents a clustering approach based on distances correlation for computing the similarity among images. Conducted experiments involving shape, color,...
Due to the complexity of geoscientific data, such as geochemical data, geophysical data and digital remote sensing data, traditional data mining methods, such as cluster analysis and association analysis, have limitations in resources evaluation. In this paper, a clustering algorithm is presented which has the ability to handle clusters of arbitrary shapes, sizes and densities. For association analysis,...
In spatial data mining, the k-means algorithm is probably the most widely applied clustering method. But a major drawback of k-means algorithm is that it is difficult to determine the parameter k to represent natural cluster, and it is only suitable for concave spherical clusters. The paper presents an efficient clustering algorithm which combines the hierarchical approach with the grid partition...
In this paper, we present a novel method based on clustering for identifying 3D line from point clouds, called “self-organizing fuzzy k-means algorithm”. The algorithm automatically finds the optimal number of cluster and self organizes the clusters based on inter/intra-cluster distances and cluster's performance evaluation. The self-organizing fuzzy k-means is applied in 3D line identification from...
Data clustering is a hot problem and has been studied extensively. In this paper, we propose a novel support vector and K-Means based hybrid algorithm for data clustering. Firstly, we identify the outliers and overlapping data points through the support vector approach. Secondly, we remove the outliers and overlapping data points and then run the K-Means on the rest data points to obtain clustered...
This paper improved the density-based clustering algorithm of data streams and proposed Double Detection Time Strategy The strategy maintained and deleted clusters dynamically. In addition, it preserved potential outlier points with the purpose of high cluster quality and efficiency. Theory and practice show that the improved algorithm possesses good practicality and effectiveness and achieves a higher...
Density-based clustering algorithms, which are important algorithms for the task of class identification in spatial database, have many advantages such as no dependence on the number of clusters, ability to discover clusters with arbitrary shapes and handle noise. However, clustering quality of most density-based clustering algorithms degrades when the clusters are of different densities. To address...
Information visualization is essential in making sense out of large data sets. Often, high-dimensional data are visualized as a collection of points in 2-dimensional space through dimensionality reduction techniques. However, these traditional methods often do not capture well the underlying structural information, clustering, and neighborhoods. In this paper, we describe GMap, a practical algorithm...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.