The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Biological data is often represented as networks, as in the case of protein-protein interactions and metabolic pathways. Modeling, analyzing, and visualizing networks can help make sense of large volumes of data generated by high-throughput experiments. However, due to their size and complex structure, biological networks can be difficult to interpret without further processing. Cluster analysis is...
Clustering is an important unsupervised data analysis technique, which divides data objects into clusters based on similarity. Clustering has been studied and applied in many different fields, including pattern recognition, data mining, decision science and statistics. Clustering algorithms can be mainly classified as hierarchical and partitional clustering approaches. Partitioning around medoids...
Clustering analysis is an active research branch in the area of data mining due to its simplicity and rapidity. However, K-means algorithm has the shortcomings of heavily depending on the initial clustering center and easily falls into local optimum. In this paper, we consider a deep research on K-means algorithm of optimization. We put forward the first selected initial clustering center of K-means...
More and more sophisticated malware attacks are developed nowadays and new variants of existing malicious software are released daily. Malware clustering is often applied to identify patterns of malicious software, with similar samples being grouped together and considered variants of the same malware family. In this paper we propose an automated technique based on agglomerative hierarchical clustering...
Cluster analysis aims at classifying data elements into different categories according to their similarity. It is a common task in data mining and useful in various field including pattern recognition, machine learning, information retrieval and so on. As an extensive studied area, many clustering methods are proposed in literature. Among them, some methods are focused on mining clusters with arbitrary...
Clustering is an important task in data mining area, especially in the area of continuous stream of data, i.e. ?data stream?. However, some characteristic of this kind of data is neglected during the existing clustering approaches. The similarity in temporal dimension between entities is underestimated. Forgetting mechanism is adopted to remove the old patterns to save computation resources. However,...
Topics on clustering ensemble have attracted much attention in recent years. In many clustering ensemble frameworks, the simple partitional clustering methods, e.g., the most famous κ-means, are used as the ensemble's member “clusterers”, due to their low computational complexity. These ensemble approaches extend the scope of application of individual clustering algorithms, and improve the robustness...
Telecommunications fraud, a new type of crime, is showing a rising trend in recent years. However, research from data mining perspectives to detect such frauds is scarce, especially with the behavioral sequences considered. Though the call detail records (CDRs) in telecommunication is generally a snapshot, the history of a caller/callee can be treated as sequences. Indeed, the historical calling sequences...
Today the wide and increasing implementation of smart meters on the household level is making it possible to better know the use of electricity by residential consumers. This contributes to improving the quality of the services provided to the consumers and also developing new strategies for planning and operation by the power grid managers. This paper analyses the discovering of patterns in the use...
Fuzzy C-Means (FCM) is the most popular algorithm of the fuzzy clustering approach. Although FCM and its variations have shown good performance in cluster detection, they do not consider that different variables could produce different membership degrees. Motivated by this, the Multi-variate Fuzzy C-Means (MFCM) method was proposed. The MFCM computes membership degrees of both clusters and variables...
Credit scoring plays an important role in financial institutions and debt based crowdfunding platforms as well as peer to peer lending platforms. In the last few years, adopting ensemble methods for credit scoring has become much more popular. However, the performance of ensemble methods is easily affected by the parameter settings and the number of base classifiers. Ensemble classification based...
In this paper, an adaptive hierarchical clustering method based on DBSCAN algorithm is proposed to get information better from Automatic Identification System, and to be aware of traffic situation on water scientifically. In order to deal with the uneven distribution of ship trajectories, the paper proposes a method of hierarchical clustering and a statistical method to determine parameters according...
Consensus clustering, also known as clustering ensembles is a technique that combines multiple clustering solutions to obtain stable, accurate and novel results. Over the last years several consensus clustering approaches were proposed addressing practical clustering problems with different degrees of success. In this paper, we consider data fragments as elements of a cluster ensemble framework. We...
In this paper we proposed an improved color based K-mean algorithm for clustering of satellite Image (SAR). Image clustering is the versatile method that can help in the provocative task of efficient search in very fast-growing image data bases. It plays an important role in image analysis, pattern recognition, image segmentation etc. Our method comprises of two stages. The first step is the calculation...
Conventional clustering algorithms based on the assumption that a data point can be assigned to only a single cluster. In spite of, there are several types of data that a data point belongs to multiple categories and causes ground-truth clusters overlap. To handle this situation, several algorithms are proposed and referred as “overlapping clustering”. One of state-of-the-art partition-based overlapping...
The main objective of clustering to form a group of similar/dissimilar data object into cluster. Cluster analysis aim to group a collection of patterns in to cluster based on similarity. Cluster is the unsupervised learning technique which is used to looping a set of unordered data object in to a smaller number of meaning full cluster. The relation between cluster either intra or inter. Clustering...
By exploring alternative approaches to combinatorial optimization, we propose the first known formal connection between clustering and set partitioning, with the goal of identifying a subclass of set partitioning problems that can be solved efficiently and with optimality guarantees through a clustering approach. We prove the equivalence between classical centroid clustering problems and a special...
Clustering is a crucial task for massive data that continuously arrive and evolve over time, generated as stream. However, data may be pervaded by uncertainty and imprecision, and techniques that achieve the unsupervised learning with imperfect data sets are unable to deal with such evolving environment. On the other hand, standard methods for clustering data streams are not adapted to an uncertain...
The fuzzy joint points (FJP) is a method that uses a fuzzy neighborhood notion to deal with neighborhood parameter selection issue of classical density-based clustering and offers an unsupervised clustering tool. Recent works improved the method in terms of speed to enable the method for big data applications. However, space efficiency of the method is still a limiting factor. In this work, we discuss...
This work introduces a hard clustering algorithm based on Particle Swarm Optimization metaheuristic that is able to partition objects considering their relational descriptions given by a single dissimilarity matrix. The PSO is a metaheuristic based on population which is well known for its simplicity, good performance and it was already designed as clustering algorithm for vector data. The proposed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.