Search results

article

K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps

Kai J. Kohlhoff, Vijay S. Pande, Russ B. Altman

IEEE Transactions on Parallel and Distributed Systems > 2013 > 24 > 8 > 1602 - 1612

We present an implementation of parallel $(K)$-means clustering, called $(K_{ps})$-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting...

chapter

A CUDA-MPI Hybrid Bitonic Sorting Algorithm for GPU Clusters

Sam White, Niels Verosky, Tia Newhall

2012 41st International Conference on Parallel Processing Workshops > 588 - 589

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results...

chapter

A Latency-Hiding Algorithm for ABMS on Parallel/Distributed Computing Environment

Li-li Chen, Jian-xin Huang, Jing Zhang

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation > 187 - 189

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (PADS)

A latency-hiding algorithm for the parallelization of large scale agent-based model simulations (ABMS) on parallel/distributed computing platform is proposed. The key idea of this algorithm is using redundant computations to hide communication latencies. An analytical model for this algorithm is presented to tell how to select R value to reach the best speedup. Compared to B+2R algorithm [1], theoretical...

chapter

kNN-MST-Agglomerative: A fast and scalable graph-based data clustering approach on GPU

Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta, Pablo Moscato

2012 7th International Conference on Computer Science & Education (ICCSE) > 585 - 590

2012 7th International Conference on Computer Science & Education (ICCSE 2012)

Data clustering is a distinctive method for analyzing complex networks in terms of functional relationships of the comprising elements. A number of graph-based algorithms have been proposed so far to tackle the complexity of the problem and many of them are based on the representation of data in the form of a minimum spanning tree (MST). In this work, we propose a graph-based agglomerative clustering...

chapter

Scalable Distributed Fast Multipole Methods

Qi Hu, Nail A. Gumerov, Ramani Duraiswami

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 270 - 279

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

The Fast Multipole Method (FMM) allows $O(N)$ evaluation to any arbitrary precision of $N$-body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures \cite{Qi11:SC11}. While impressive performance was reported, the algorithms did not demonstrate complete weak or strong...

chapter

Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA

Yu-Rong Chen, Che Lun Hung, Yu-Shiang Lin, Chun-Yuan Lin, more

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 849 - 854

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

The construction of phylogenetic trees is important for the computational biology, especially for the development of biological taxonomies. UPGMA is one of the most popular heuristic algorithms for constructing ultrametric trees (UT). Although the UT constructed by the UPGMA often is not a true tree unless the molecular clock assumption holds, the UT is still useful for the clocklike data. However,...

chapter

High-performance implementations of a clustering algorithm for finding network communities

Alex Restrepo, Andres Solano, Jerry Scripps, Christian Trefftz, more

2012 IEEE International Conference on Electro/Information Technology > 1 - 6

2012 IEEE International Conference on Electro/Information Technology (EIT 2012)

The size and interconnectedness of social networks continues to increase. As a result, finding communities or subsets of like nodes within these large networks has become a resource-intensive endeavor. In this paper, we characterize community-finding organized on the basis of network/set properties, and describe an agglomerative algorithm called egocentric community finding. The primary contribution...

chapter

Parallel Multi-Temporal Remote Sensing Image Change Detection on GPU

Huming Zhu, Yu Cao, Zhiqiang Zhou, Maoguo Gong

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1898 - 1904

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Change detection is an important technique in damage assessment area. As the amount of remote sensing images and the complexity of algorithms rise, the demand for processing power is increasing. In this paper, we propose PLog-FLCM, a parallel algorithm for change detection. It is implemented on AMD Accelerated Parallel Processing (APP) SDK v2 based on Open Computing Language. The parallel characteristics...

chapter

A GPU-accelerated Approximate Algorithm for Incremental Learning of Gaussian Mixture Model

Chunlei Chen, Dejun Mu, Huixiang Zhang, Bo Hong

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1937 - 1943

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The Gaussian mixture model (GMM) is a widely used probabilistic clustering model. The incremental learning algorithm of GMM is the basis of a variety of complex incremental learning algorithms. It is typically applied to real-time or massive data problems where the standard Expectation Maximum (EM) algorithm does not work. But the output of the incremental learning algorithm may exhibit degraded cluster...

chapter

Efficient Quality Threshold Clustering for Parallel Architectures

Anthony Danalis, Collin McCurdy, Jeffrey S. Vetter

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 1068 - 1079

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Quality Threshold Clustering (QTC) is an algorithm for partitioning data, in fields such as biology, where clustering of large data-sets can aid scientific discovery. Unlike other clustering algorithms, QTC does not require knowing the number of clusters a priori, however, its perceived need for high computing power often makes it an unattractive choice. This paper presents a thorough study of QTC...

chapter

Dynamic load balancing on GPU clusters for large-scale K-Means clustering

Ekasit Kijsipongse, Suriya U-ruekolan

2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE) > 346 - 350

2012 International Joint Conference on Computer Science and Software Engineering (JCSSE)

K-Means is the clustering algorithm which is widely used in many areas such as information retrieval, computer vision and pattern recognition. With the recent advance in General Purpose Graphics Processing Unit (GPGPU), we can use a modern GPU which is capable to do computation up to Tflops to calculate K-Means clustering on average problems. However, due to the exponential growth of data, the K-Means...

chapter

GPU Accelerated Hot Term Extraction from User Generated Content

Ming-Fung Cheng, Korris Fu-Lai Chung, Siu-Nam Chuang

2012 26th International Conference on Advanced Information Networking and Applications Workshops > 851 - 856

2012 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA)

In this paper, a GPU based hot term extraction algorithm is presented. Graphics Processing Units (GPUs) is designed for data-parallel computations. Comparing to running a single program with multiple data in CPU, GPU can have faster execution. The hot term is defined as a word that appears frequently in the search result. We assume that the greater the frequency of appearance of a term, the more the...

chapter

Fast PageRank Computation on a GPU Cluster

Arnon Rungsawang, Bundit Manaskasemsak

2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing > 450 - 456

2012 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

We investigate the use of graphics processing units (GPUs) in accelerating Page Rank computation. We first introduce a compact web graph representation which requires much less memory allocation than a well-known compressed sparse row format. The web graph is then simply partition into smaller chunks to fit the GPUs' device memory. We propose a fast Page Rank algorithm to run on the GPU cluster. The...

chapter

Scalable clustering using multiple GPUs

Mohiuddin K. Wasif, P. J. Narayanan

2011 18th International Conference on High Performance Computing > 1 - 10

2011 18th International Conference on High Performance Computing (HiPC)

K-Means is a popular clustering algorithm with wide applications in Computer Vision, Data mining, Data Visualization, etc. Clustering is an important step for indexing and searching of documents, images, video, etc. Clustering large numbers of high-dimensional vectors is very computation intensive. In this paper, we present the design and implementation of the K-Means clustering algorithm on the modern...

chapter

Applications of Heterogeneous Computing in Computational and Simulation Science

Luke Domanski, Tomasz Bednarz, Tim E. Gureyev, Lawrence Murray, more

2011 Fourth IEEE International Conference on Utility and Cloud Computing > 382 - 389

2011 IEEE 4th International Conference on Utility and Cloud Computing (UCC 2011)

As the size and complexity of scientific problems and datasets grow, scientists from a broad range of discipline areas are relying more and more on computational methods and simulations to help solve their problems. This paper presents a summary of heterogeneous algorithms and applications that have been developed by a large research organization (CSIRO) for solving practical and challenging science...

chapter

Parallel clustering for visualizing large scientific line data

Jishang Wei, Hongfeng Yu, Jacqueline H. Chen, Kwan-Liu Ma

2011 IEEE Symposium on Large Data Analysis and Visualization > 47 - 55

2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV)

Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing...

chapter

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

Xiaohui Cui, Jesse St. Charles, Thomas E. Potok

2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery > 220 - 225

2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

Analyzing and clustering large scale data set is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of data clustering is its complexity $O(n^2)$. As the number of data and feature dimensions grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last...

chapter

Efficient Hierarchical Agglomerative Clustering Algorithms on GPU Using Data Partitioning

S.A. Arul Shalom, Manoranjan Dash

2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies > 134 - 139

2011 12th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT)

We explore the capabilities of today's high-end Graphics processing units (GPU) on desktops to efficiently perform hierarchical agglomerative clustering (HAC) through partitioning of data. Traditional HAC has high time and memory complexities leading to low clustering efficiencies. We reduce time and memory bottlenecks of the traditional HAC algorithm by exploring the performance capabilities of the...

chapter

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit

Ashish Kumar Singh, Sreeram Potluri, Hao Wang, Krishna Kandalla, more

2011 IEEE International Conference on Cluster Computing > 420 - 427

2011 IEEE International Conference on Cluster Computing (CLUSTER)

General Purpose Graphics Processing Units (GPGPUs) are rapidly becoming an integral part of high performance system architectures. The Tianhe-1A and Tsubame systems received significant attention for their architectures that leverage GPGPUs. Increasingly many scientific applications that were originally written for CPUs using MPI for parallelism are being ported to these hybrid CPU-GPU clusters. In...

chapter

GPApriori: GPU-Accelerated Frequent Itemset Mining

Fan Zhang, Yan Zhang, Jason Bakos

2011 IEEE International Conference on Cluster Computing > 590 - 594

2011 IEEE International Conference on Cluster Computing (CLUSTER)

In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM). We tested our implementation with an Nvidia Tesla T10 graphic processor and demonstrate up to 100X speedup as compared with several state-of-the-art FIM algorithms on a CPU. In order to map the Apriori algorithm onto the SIMD execution model, we have designed a "static bitset" memory...

INFONA - science communication portal

Search results

K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps

A CUDA-MPI Hybrid Bitonic Sorting Algorithm for GPU Clusters

A Latency-Hiding Algorithm for ABMS on Parallel/Distributed Computing Environment

kNN-MST-Agglomerative: A fast and scalable graph-based data clustering approach on GPU

Scalable Distributed Fast Multipole Methods

Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA

High-performance implementations of a clustering algorithm for finding network communities

Parallel Multi-Temporal Remote Sensing Image Change Detection on GPU

A GPU-accelerated Approximate Algorithm for Incremental Learning of Gaussian Mixture Model

Efficient Quality Threshold Clustering for Parallel Architectures

Dynamic load balancing on GPU clusters for large-scale K-Means clustering

GPU Accelerated Hot Term Extraction from User Generated Content

Fast PageRank Computation on a GPU Cluster

Scalable clustering using multiple GPUs

Applications of Heterogeneous Computing in Computational and Simulation Science

Parallel clustering for visualizing large scientific line data

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

Efficient Hierarchical Agglomerative Clustering Algorithms on GPU Using Data Partitioning

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit

GPApriori: GPU-Accelerated Frequent Itemset Mining

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options