The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose EC3, a novel algorithm that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using a convex optimization function. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental...
This paper considers ship route extraction and clustering problem based on Automatic Identification System (AIS) data. For the ships with known Maritime Mobile Service Identify (MMSI), we propose a ship route extraction method by using AIS data. For ship route clustering, hierarchical clustering method is selected. We firstly define a distance between ship routes to measure the dissimilarity of them...
Mutual information clustering is an agglomerative hierarchical clustering method that has been used to group random variables or sets thereof. Some researchers have found that the normalization method used can lead to oddly-sized clusters that do not line up with expected results. We introduce a new normalization parameter to control the size of the clusters, and apply it to food allergy data from...
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may...
Clustering is an effective method for data analysis and can be exploited to unknown features of data samples, its applications range from data mining to bioinformatics analysis. Several clustering approaches have been proposed in order to obtain a better trade-off between accuracy and efficiency of the clustering process. It is well-known that no existing clustering algorithm completely satisfies...
Biological data is often represented as networks, as in the case of protein-protein interactions and metabolic pathways. Modeling, analyzing, and visualizing networks can help make sense of large volumes of data generated by high-throughput experiments. However, due to their size and complex structure, biological networks can be difficult to interpret without further processing. Cluster analysis is...
Clustering is an important unsupervised data analysis technique, which divides data objects into clusters based on similarity. Clustering has been studied and applied in many different fields, including pattern recognition, data mining, decision science and statistics. Clustering algorithms can be mainly classified as hierarchical and partitional clustering approaches. Partitioning around medoids...
This survey highlights issues in clustering which hinder in achieving optimal solution or generates inconsistent outputs. We called such malignancies as dark patches. We focus on the issues relating to clustering rather than concepts and techniques of clustering. For better insight into the issues of clustering, we categorize dark patches into three classes and then compare various clustering methods...
Wireless sensor network (WSN) is an inexpensive newfound technology with many applications in various fields (such as biology Environment, war and natural disasters). A network consisting of a large number of sensor nodes and collecting information from the environment in a distributed environment. The main limitations include limited energy, low communication capacity, low storage volume, and low...
Localization of a viewer's region of interest (ROI) on eye gaze signal trajectories acquired by eye trackers is a widely used approach in scene analysis, image compression, and quality of experience assessment. In this paper, we propose a novel clustering approach for ROI estimation from potentially noisy raw eye gaze data, based on signal processing on graphs. The clustering approach adapts graph...
Clustering is a popular method to deal with the problem for mode identification of multimode processes. Unlike traditional distance-based clustering methods, in this paper, a new correlation-based bi-partition hierarchical clustering (CBHC) method is proposed, which classifies the observations according to their correlation relationships rather than their distances. Motivated by an existing correlation-based...
Spectral clustering is one of the most effective methods of data mining, in which the adjacency matrix is constructed by using the similarity matrix. In this paper, to extend spectral clustering method for uncertain data clustering, we propose a new spectral clustering method based on JS-divergence. In the proposed method, the JS-divergence is used to construct the adjacency matrix in the spectral...
The dynamometer card is a main method to analyze downhole working conditions of the beam pumping unit in actual operation. For computer based diagnosis mode, a method based on 16-directions chain codes and K-means clustering is proposed in this paper. First, the 16-directions chain codes are used to recreate boundary contour curve of the dynamometer card; then seven feature vectors which can accurately...
Image clustering is a crucial but challenging task in machine learning and computer vision. Existing methods often ignore the combination between feature learning and clustering. To tackle this problem, we propose Deep Adaptive Clustering (DAC) that recasts the clustering problem into a binary pairwise-classification framework to judge whether pairs of images belong to the same clusters. In DAC, the...
Currently, the government is still having difficulties in distributing teachers. The current problem is not just about less teachers, but also more teachers in some cities. The problem of unequal distribution of teachers then became dependent on local government. The distribution of teachers now can not be centralized because of the decentralization system implemented in Indonesia. Clustering in data...
Clustering analysis is an active research branch in the area of data mining due to its simplicity and rapidity. However, K-means algorithm has the shortcomings of heavily depending on the initial clustering center and easily falls into local optimum. In this paper, we consider a deep research on K-means algorithm of optimization. We put forward the first selected initial clustering center of K-means...
An important research topic of the recent years has been to understand and analyze manifold-modeled data for clustering and classification applications. Most clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate difficult sampling conditions only...
More and more sophisticated malware attacks are developed nowadays and new variants of existing malicious software are released daily. Malware clustering is often applied to identify patterns of malicious software, with similar samples being grouped together and considered variants of the same malware family. In this paper we propose an automated technique based on agglomerative hierarchical clustering...
We present a density-based clustering method producing a covering of the dataset by ellipsoidal structures in order to detect possibly entangled clusters. We first introduce an unconstrained version of the algorithm which does not require any assumption on the number of clusters. Then a constrained version using a priori knowledge to improve the bare clustering is discussed. We evaluate the performance...
Aiming at the multiple attribute decision making problem with three-parameter interval grey numbers, a grey-incidence clustering decision making method based on regret theory is proposed in this paper. First, according to the idea of TOPSIS method, a kind of comprehensive grey interval incidence coefficient of three-parameter interval grey number is defined, and the “regret-rejoice” value is calculated...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.