The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present an implementation of parallel $(K)$-means clustering, called $(K_{ps})$-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting...
We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results...
A latency-hiding algorithm for the parallelization of large scale agent-based model simulations (ABMS) on parallel/distributed computing platform is proposed. The key idea of this algorithm is using redundant computations to hide communication latencies. An analytical model for this algorithm is presented to tell how to select R value to reach the best speedup. Compared to B+2R algorithm [1], theoretical...
Data clustering is a distinctive method for analyzing complex networks in terms of functional relationships of the comprising elements. A number of graph-based algorithms have been proposed so far to tackle the complexity of the problem and many of them are based on the representation of data in the form of a minimum spanning tree (MST). In this work, we propose a graph-based agglomerative clustering...
The Fast Multipole Method (FMM) allows $O(N)$ evaluation to any arbitrary precision of $N$-body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures \cite{Qi11:SC11}. While impressive performance was reported, the algorithms did not demonstrate complete weak or strong...
The construction of phylogenetic trees is important for the computational biology, especially for the development of biological taxonomies. UPGMA is one of the most popular heuristic algorithms for constructing ultrametric trees (UT). Although the UT constructed by the UPGMA often is not a true tree unless the molecular clock assumption holds, the UT is still useful for the clocklike data. However,...
The size and interconnectedness of social networks continues to increase. As a result, finding communities or subsets of like nodes within these large networks has become a resource-intensive endeavor. In this paper, we characterize community-finding organized on the basis of network/set properties, and describe an agglomerative algorithm called egocentric community finding. The primary contribution...
Change detection is an important technique in damage assessment area. As the amount of remote sensing images and the complexity of algorithms rise, the demand for processing power is increasing. In this paper, we propose PLog-FLCM, a parallel algorithm for change detection. It is implemented on AMD Accelerated Parallel Processing (APP) SDK v2 based on Open Computing Language. The parallel characteristics...
The Gaussian mixture model (GMM) is a widely used probabilistic clustering model. The incremental learning algorithm of GMM is the basis of a variety of complex incremental learning algorithms. It is typically applied to real-time or massive data problems where the standard Expectation Maximum (EM) algorithm does not work. But the output of the incremental learning algorithm may exhibit degraded cluster...
Quality Threshold Clustering (QTC) is an algorithm for partitioning data, in fields such as biology, where clustering of large data-sets can aid scientific discovery. Unlike other clustering algorithms, QTC does not require knowing the number of clusters a priori, however, its perceived need for high computing power often makes it an unattractive choice. This paper presents a thorough study of QTC...
K-Means is the clustering algorithm which is widely used in many areas such as information retrieval, computer vision and pattern recognition. With the recent advance in General Purpose Graphics Processing Unit (GPGPU), we can use a modern GPU which is capable to do computation up to Tflops to calculate K-Means clustering on average problems. However, due to the exponential growth of data, the K-Means...
In this paper, a GPU based hot term extraction algorithm is presented. Graphics Processing Units (GPUs) is designed for data-parallel computations. Comparing to running a single program with multiple data in CPU, GPU can have faster execution. The hot term is defined as a word that appears frequently in the search result. We assume that the greater the frequency of appearance of a term, the more the...
We investigate the use of graphics processing units (GPUs) in accelerating Page Rank computation. We first introduce a compact web graph representation which requires much less memory allocation than a well-known compressed sparse row format. The web graph is then simply partition into smaller chunks to fit the GPUs' device memory. We propose a fast Page Rank algorithm to run on the GPU cluster. The...
K-Means is a popular clustering algorithm with wide applications in Computer Vision, Data mining, Data Visualization, etc. Clustering is an important step for indexing and searching of documents, images, video, etc. Clustering large numbers of high-dimensional vectors is very computation intensive. In this paper, we present the design and implementation of the K-Means clustering algorithm on the modern...
As the size and complexity of scientific problems and datasets grow, scientists from a broad range of discipline areas are relying more and more on computational methods and simulations to help solve their problems. This paper presents a summary of heterogeneous algorithms and applications that have been developed by a large research organization (CSIRO) for solving practical and challenging science...
Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing...
Analyzing and clustering large scale data set is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of data clustering is its complexity $O(n^2)$. As the number of data and feature dimensions grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last...
We explore the capabilities of today's high-end Graphics processing units (GPU) on desktops to efficiently perform hierarchical agglomerative clustering (HAC) through partitioning of data. Traditional HAC has high time and memory complexities leading to low clustering efficiencies. We reduce time and memory bottlenecks of the traditional HAC algorithm by exploring the performance capabilities of the...
General Purpose Graphics Processing Units (GPGPUs) are rapidly becoming an integral part of high performance system architectures. The Tianhe-1A and Tsubame systems received significant attention for their architectures that leverage GPGPUs. Increasingly many scientific applications that were originally written for CPUs using MPI for parallelism are being ported to these hybrid CPU-GPU clusters. In...
In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM). We tested our implementation with an Nvidia Tesla T10 graphic processor and demonstrate up to 100X speedup as compared with several state-of-the-art FIM algorithms on a CPU. In order to map the Apriori algorithm onto the SIMD execution model, we have designed a "static bitset" memory...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.