The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering analysis is used to explore the classification for large dataset and Canberra distance is generalized so that it can process the data with categorical attributes. Based on the generalized Canberra distance definition, an instance of constraint-based clustering is introduced. Meanwhile, the nearest neighbor classification is improved. Class-labeled clusters are regarded as classifying models...
Outlier detection is one of the branches of data mining, with important applications in the domains of finance fraud detection, network intrusion analysis and so on. But most applications are high dimensional domains. Many algorithms use the concept of proximity to find outliers based on the relationship to the data set. However, the sparsity of high dimensional points results to the algorithms are...
A new kind of clustering algorithm called LOCAHID is presented in this paper. LOCAHID views each potential cluster as a tight coupling structure, which can be described by a density tree. Every density tree is dynamically generated according to its local density distribution. Those "closer" clusters are merged if some conditions are satisfied. In order to extend its applications to large...
Clustering of binary fingerprints is used in the classification of gene expression data. It is known that the clustering of binary fingerprints with 3 bits of missing value is NP-hard. The greedy clique partition (GCP for short) algorithm is a heuristic algorithm used to clustering of binary fingerprints with missing values. In this paper, we firstly study the feature of instances which can not be...
Speaker clustering is involved in serial structure speaker identification system to reduce the algorithm delay and computational complexity. The speech is first classified into speaker group, and then searches the most likely one inside the group. Difference between Gaussian mixture models (GMMs) is widely applied in speaker classification. The paper proposes a novel measure based on pseudo-divergence,...
Clustering-based image segment approach is popular in image processing. It consists in separating pixel features into clusters representing homogeneous regions. In the kind of methods, determining the number of clusters is an open problem. In this paper, we propose an efficient model selection algorithm for automatically determining the number of clusters. The algorithm roots the try-and-error approach...
DBSCAN is a typical clustering algorithm, which can discover clusters with any arbitrary shape and handle noise well. However, it is also slow in comparison due to neighborhood query for each object and faces difficulty in setting density threshold properly. In this paper, a fast density-based clustering algorithm is presented based on DBSCAN. After sorting objects by a certain dimensional coordinates,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.