Search results

Items from 1 to 14 out of 14 results

chapter

A Batched GPU Algorithm for Set Intersection

Di Wu, Fan Zhang, Naiyong Ao, Fang Wang, more

2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks > 752 - 756

2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN 2009)

Intersection of inverted lists is a frequently used operation in search engine systems. Efficient CPU and GPU intersection algorithms for large problem size are well studied. We propose an efficient GPU algorithm for high performance intersection of inverted index lists on CUDA platform. This algorithm feeds queries to GPU in batches, thus can take full advantage of GPU processor cores even if problem...

chapter

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms

Bo Wang, Tianji Wu, Feng Yan, Ruirui Li, more

2009 15th International Conference on Parallel and Distributed Systems > 284 - 291

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

NVIDIA CUDA and ATI Stream are the two major general-purpose GPU (GPGPU) computing technologies. We implemented RankBoost, a web relevance ranking algorithm, on both NVIDIA CUDA and ATI Stream platforms to accelerate the algorithm and illustrate the differences between these two technologies. It shows that the performances of GPU programs are highly dependent on the utilization of GPU's hardware memory...

chapter

Parallel Lexicographic Names Construction with CUDA

Weidong Sun, Zongmin Ma

2009 15th International Conference on Parallel and Distributed Systems > 913 - 918

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

Suffix array is a simpler and compact alternative to the suffix tree, lexicographic name construction is the fundamental building block in suffix array construction process. This paper depicts the design issues of first data parallel implementation of the lexicographic name construction algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, both...

chapter

Count Sort for GPU Computing

Weidong Sun, Zongmin Ma

2009 15th International Conference on Parallel and Distributed Systems > 919 - 924

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

Counting sort is a simple, stable and efficient sort algorithm with linear running time, which is a fundamental building block for many applications. This paper depicts the design issues of a data parallel implementation of the count sort algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, both from NVIDIA Corporation. The full parallel version...

chapter

Solving 2D Nonlinear Unsteady Convection-Diffusion Equations on Heterogenous Platforms with Multiple GPUs

Canqun Yang, Zhen Ge, Juan Chen, Feng Wang, more

2009 15th International Conference on Parallel and Distributed Systems > 961 - 966

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

Solving complex convection-diffusion equations is very important to many practical mathematical and physical problems. After the finite difference discretization, most of the time for equations solution is spent on sparse linear equation solvers. In this paper, our goal is to solve 2D Nonlinear Unsteady Convection-Diffusion Equations by accelerating an iterative algorithm named Jacobi-preconditioned...

chapter

An Efficient GPU Implementation for Large Scale Individual-Based Simulation of Collective Behavior

U. Erra, B. Frola, V. Scarano, I. Couzin

2009 International Workshop on High Performance Computational Systems Biology > 51 - 58

2009 International Workshop on High Performance Computational Systems Biology (HiBi 2009)

In this work we describe a GPU implementation for an individual-based model for fish schooling. In this model each fish aligns its position and orientation with an appropriate average of its neighbors' positions and orientations. This carries a very high computational cost in the so-called nearest neighbors search. By leveraging the GPU processing power and the new programming model called CUDA we...

chapter

CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

J. Siegel, J. Ributzka, Xiaoming Li

2009 International Conference on Parallel Processing Workshops > 174 - 181

2009 38th International Conference on Parallel Processing Workshops (ICPPW 2009)

Modern GPUs open a completely new field to optimize embarrassingly parallel algorithms. Implementing an algorithm on a GPU confronts the programmer with a new set of challenges for program optimization. Some of the most notable ones are isolating the part of the algorithm that can be optimized to run on the GPU; tuning the program for the GPU memory hierarchy whose organization and performance implications...

chapter

Using a cluster as a memory resource: A fast and large virtual memory on MPI

H. Midorikawa, K. Saito, M. Sato, T. Boku

2009 IEEE International Conference on Cluster Computing and Workshops > 1 - 10

2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER)

The 64-bit OS provides ample memory address space that is beneficial for applications using a large amount of data. This paper proposes using a cluster as a memory resource for sequential applications requiring a large amount of memory. This system is an extension of our previously proposed socket-based distributed large memory system (DLM), which offers large virtual memory by using remote memory...

chapter

Leveraging Computation Sharing and Parallel Processing in Location-Based Services

J. Cazalas, Kien Hua

2009 International Conference on Computational Science and Engineering > 2 > 221 - 228

2009 International Conference on Computational Science and Engineering (CSE)

A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. In this paper, we introduce an efficient and scalable system for monitoring continuous queries by leveraging the parallel processing capability of the graphics processing unit. We examine...

chapter

Program Optimization of Stencil Based Application on the GPU-Accelerated System

Guibin Wang, Xuejun Yang, Ying Zhang, Tao Tang, more

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications > 219 - 225

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computational power to accelerate general purpose applications. But the powerful computing capacity could not be fully utilized for memory-intensive applications, which are limited by off-chip memory bandwidth and latency. Stencil computation has abundant parallelism and low computational intensity...

chapter

Multi-agent traffic simulation with CUDA

D. Strippgen, K. Nagel

2009 International Conference on High Performance Computing&Simulation > 106 - 114

2009 International Conference on High Performance Computing & Simulation (HPCS)

Today's graphics processing units (GPU) have tremendous resources when it comes to raw computing power. The simulation of large groups of agents in transport simulation has a huge demand of computation time. Therefore it seems reasonable to try to harvest this computing power for traffic simulation. Unfortunately simulating a network of traffic is inherently connected with random memory access. This...

chapter

Hierarchical Agglomerative Clustering Using Graphics Processor with Compute Unified Device Architecture

S.A.A. Shalom, M. Dash, M. Tue, N. Wilson

2009 International Conference on Signal Processing Systems > 556 - 561

2009 International Conference on Signal Processing Systems (ICSPS)

We explore the use of todaypsilas high-end graphics processing units on desktops to perform hierarchical agglomerative clustering with the compute unified device architecture - CUDA of NVIDIA. Although the advancement in graphics cards has made the gaming industry to flourish,there is a lot more to be gained the field of scientific computing, high performance computing and their applications. Previous...

chapter

Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis

K. Madduri, D.A. Bader

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 11

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socio-economic interactions, social networking Web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. We present the first study of novel high-performance combinatorial techniques for analyzing largescale information networks, encapsulating dynamic interaction...

article

Novel Architectures: Solving Computational Problems with GPU Computing

J. Cohen, M. Garland

Computing in Science & Engineering > 2009 > 11 > 5 > 58 - 63

Modern GPUs are massively parallel microprocessors that can deliver very high performance for the parallel computations common in science and engineering.

Filter options

Data set:
ieee
Keywords:
KERNEL
ARRAYS
YARN

Publication date

Set your own date range

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options