The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, a configurable many-core hardware/software architecture is proposed to efficiently execute the widely known and commonly used K-means clustering algorithm. A prototype was designed and implemented on a Xilinx Zynq-7000 All Programmable SoC. A single core configured with the slowest configuration achieves a 10× speed-up compared to the software only solution. The system is fully scalable...
Due to the growing presence of large-scale and streaming graphs such as social networks, graph sampling and clustering play an important role in many real-world applications. One key aspect of graph clustering is the evaluation of cluster quality. However, little attention has been paid to evaluation measures for clustering quality on samples of graphs. As first steps towards appropriate evaluation...
Finding frequent patterns is an important problem in data mining. We have devised a method for detecting frequent patterns in event log data. By representing events in a graph structure, we can generate clusters of frequently co-occurring events. This method is compared with basic association mining techniques and found to give a “macro-level” overview of patterns, which is more interpretable. In...
The process of clustering similar words is crucial for a broad range of applications such as text classification and word sense disambiguation. Several approaches for deriving word similarity have been proposed. Some, like latent semantic analysis, are derived from the distributional hypothesis. Others extract relationships between terms by drawing upon predefined linguistic patterns. In this work,...
In the present world, it is hard to overlook — the omnipresence of ‘network’. Be it the study of internet structure, mobile network, protein interactions or social networks, they all religiously emphasizes on network and graph studies. Social network analysis is an emerging field including community detection as its key task. A community in a network, depicts group of nodes in which density of links...
Online shopping is a common shopping style for human being nowadays. Rating mechanisms usually exist in most of the shopping sites. Therefore, predicting which products a customer is going to buy next from the rating information becomes possible, making recommender systems important for online shopping. The success of an online shopping site can be dominated by the quality of the recommender system...
Clustering is a fundamental tool for data analysis. Typically, all attributes of the data are used for clustering. However, if a set of attributes can be divided into meaningful subsets, it may be effective to cluster the data for each subset. In this paper, we propose a method for dividing the set of elements of feature vectors into meaningful subsets. Considering the dependencies between the elements,...
In this paper we consider the Fisher-Rao distance in the space of the multivariate diagonal Gaussian distributions for clustering methods. Centroids in this space are derived and used to introduce two clustering algorithms for diagonal Gaussian mixture models associated to this metric: the k-means and the hierarchical clustering. These algorithms allow to reduce the number of components of such mixture...
The detection of node clusters (communities) in graphs has been at the core of many modeling paradigms emerging in different fields and disciplines such as Social Sciences, Biology, Chemistry, Telecommunications and Linguistics. When evaluating the quality of a clustering arrangement unsupervised metrics can be utilized (e.g. modularity), which all rely on structural and topological characteristics...
Traditional hierarchical clustering (HC) methods are not scalable with the size of databases. To address this issue, a series of summarization techniques, i.e. data bubbles (DB) and its improved versions, have been proposed to compress very large databases into representative seed points suitable for subsequent hierarchy construction. However, DB and its variants have two common drawbacks: 1) their...
Load Patterns (LPs) clustering has a broad range of applications, such as tariff formulation, power system planning, load forecasting, and so on. In this paper, we develop a multi-objective version of Differential Evolution (DE) using a Pareto Tournament (PT) selection to solve the LP clustering problem. Our automatic DE LP clustering (ADE-LPC) algorithm provides an entire Pareto front, and by incorporating...
It is well known that clustering is an unsupervised machine learning technique. However, most of the clustering methods need setting several parameters such as number of clusters, shape of clusters, or other user- or problem-specific parameters and thresholds. In this paper, we propose a new clustering approach which is fully autonomous, in the sense that it does not require parameters to be pre-defined...
In the multi-label classification issue, some implicit constraints and dependencies are always existed among labels. Exploring the correlation information among different labels is important for many applications. It not only can enhance the classifier performance but also can help to interpret the classification results for some specific applications. This paper presents an improved multi-label classification...
With the rapid development of technology, acquiring and storing big data from various fields is no longer a problem. Instead, how to utilize the data becomes an important and hot research topic. Clustering is one of the important tasks for big data utility. However, there exists one well-known challenge for the task, i.e. it is difficult to incorporate prior information into the clustering results...
Online short texts of hot topics submitted to social media by users can provide valuable personal opinions, which are useful for service providers and individuals. However, it is difficult for readers to grasp the main opinions of massive short texts. In this paper, to cope with the summarization challenge of short texts, we proposed a novel approach, which makes full use of BM25 to weight each short...
Degree distribution, hierarchy, clustering and small-world are typical graph features to measure the structure of complex networks. This paper uses these features to compare and evaluate the performances of biased and unbiased sampling algorithms on two types of scale-free networks. The numerical analysis verifies that the biased sampling performs better than the unbiased sampling on networks in which...
A Mobile Ad hoc Network (MANET) is a multi-hop wireless network in which the mobile nodes are dynamic in nature and has a limited bandwidth and minimum battery power. Due to this challenging environment the mobile nodes can be grouped into clusters to achieve better stability and scalability. Grouping the mobile nodes is called clustering, in which a leader node is elected to manage the entire network...
Superpixel segmentation becomes more and more popular in the fields of computer vision and image processing. The simple linear iterative clustering (SLIC) is widely used due to its high segmentation accuracy and low computational complexity. In this paper, we propose a variance adaptive SLIC (VASLIC) algorithm. The compactness factor of the proposed algorithm is determined according to the image neighbourhood...
Communities play an important role in the field of graph structure, especially in domains of networks analysis. A community (also referred to as a cluster) is a dense subgraph of the whole graph with more links between its members than between its members to the outside nodes. Communities overlap when nodes in graph belongs to multiple communities. Overlapping community detection is developing in...
The seminal works by Karger [13], [14] have shown that one can use Uniform Random Edge (URE) sampling to generate a graph skeleton which accurately approximates all cut-values in the original graph with high probability under some specific assumptions. As such, the random subgraphs resulted from URE sampling can often be used as substitutes for the original graphs in cut/flow-related graph-optimization...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.