Search results

Items from 41 to 52 out of 52 results

chapter

A Data Communication Scheduler for Stream Programs on CPU-GPU Platform

Tao Tang, Xinhai Xu, Yisong Lin

2010 10th IEEE International Conference on Computer and Information Technology > 139 - 146

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

In recent years, heterogeneous parallel system have become a focus research area in high performance computing field. Generally, in a heterogeneous parallel system, CPU provides the basic computing environment and special purpose accelerator (GPU in this paper) provides high computing performance. However, the overall performance of the system is prone to be limited by the data communication between...

chapter

Scalable and Parallel Implementation of a Financial Application on a GPU: With Focus on Out-of-Core Case

Myungho Lee, Jin-hong Jeon, Joonsuk Kim, Joonhyun Song

2010 10th IEEE International Conference on Computer and Information Technology > 1323 - 1327

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

The architecture of the latest Graphic Processing Unit (GPU) consists of a number of uniform programmable units integrated on the same chip, which facilitate the general-purpose computing beyond the graphic processing. With the multiple programmable units executing in parallel, the latest GPU shows superior performance for many non-graphic applications. Furthermore, programmers can have a direct control...

chapter

Improving the Performance of the Sparse Matrix Vector Product with GPUs

F Vázquez, G Ortega, J J Fernández, E M Garzón

2010 10th IEEE International Conference on Computer and Information Technology > 1146 - 1151

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

Sparse matrices are involved in linear systems, eigensystems and partial differential equations from a wide spectrum of scientific and engineering disciplines. Hence, sparse matrix vector product (SpMV) is considered as key operation in engineering and scientific computing. For these applications the optimization of the sparse matrix vector product (SpMV) is very relevant. However, the irregular computation...

chapter

GPU-accelerated synthetic aperture radar backprojection in CUDA

Ahmed Fasih, Timothy Hartley

2010 IEEE Radar Conference > 1408 - 1413

2010 IEEE International Radar Conference

Pleasingly parallel algorithms such as filtered back-projection have been documented to enjoy significant speedups when ported to run on a graphics processor instead of a standard CPU. Presented here is a two-dimensional SAR backprojection implementation for a single GPU using the NVIDIA CUDA framework. Given that input range projections may be too large to fit in graphics memory, our implementation...

chapter

Simulating anomalous diffusion on graphics processing units

Karl Heinz Hoffmann, Michael Hofmann, Jens Lang, Gudula Runger, more

2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW) > 1 - 8

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010)

The computational power of modern graphics processing units (GPUs) has become an interesting alternative in high performance computing. The specialized hardware of GPUs delivers a high degree of parallelism and performance. Various applications in scientific computing have been implemented such that computationally intensive parts are executed on GPUs. In this article, we present a GPU implementation...

chapter

Solving 2D Nonlinear Unsteady Convection-Diffusion Equations on Heterogenous Platforms with Multiple GPUs

Canqun Yang, Zhen Ge, Juan Chen, Feng Wang, more

2009 15th International Conference on Parallel and Distributed Systems > 961 - 966

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

Solving complex convection-diffusion equations is very important to many practical mathematical and physical problems. After the finite difference discretization, most of the time for equations solution is spent on sparse linear equation solvers. In this paper, our goal is to solve 2D Nonlinear Unsteady Convection-Diffusion Equations by accelerating an iterative algorithm named Jacobi-preconditioned...

chapter

An Efficient GPU Implementation for Large Scale Individual-Based Simulation of Collective Behavior

U. Erra, B. Frola, V. Scarano, I. Couzin

2009 International Workshop on High Performance Computational Systems Biology > 51 - 58

2009 International Workshop on High Performance Computational Systems Biology (HiBi 2009)

In this work we describe a GPU implementation for an individual-based model for fish schooling. In this model each fish aligns its position and orientation with an appropriate average of its neighbors' positions and orientations. This carries a very high computational cost in the so-called nearest neighbors search. By leveraging the GPU processing power and the new programming model called CUDA we...

chapter

Leveraging Computation Sharing and Parallel Processing in Location-Based Services

J. Cazalas, Kien Hua

2009 International Conference on Computational Science and Engineering > 2 > 221 - 228

2009 International Conference on Computational Science and Engineering (CSE)

A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. In this paper, we introduce an efficient and scalable system for monitoring continuous queries by leveraging the parallel processing capability of the graphics processing unit. We examine...

chapter

Multi-agent traffic simulation with CUDA

D. Strippgen, K. Nagel

2009 International Conference on High Performance Computing&Simulation > 106 - 114

2009 International Conference on High Performance Computing & Simulation (HPCS)

Today's graphics processing units (GPU) have tremendous resources when it comes to raw computing power. The simulation of large groups of agents in transport simulation has a huge demand of computation time. Therefore it seems reasonable to try to harvest this computing power for traffic simulation. Unfortunately simulating a network of traffic is inherently connected with random memory access. This...

chapter

A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU

T. Okuyama, F. Ino, K. Hagihara

2008 IEEE International Symposium on Parallel and Distributed Processing with Applications > 284 - 291

2008 IEEE International Symposium on Parallel and Distributed Processing with Applications

This paper proposes a fast method for computing the costs of all-pairs shortest paths (APSPs) on the graphics processing unit (GPU). The proposed method is implemented using compute unified device architecture (CUDA), which offers us a development environment for performing general-purpose computation on the GPU. Our method is based on Harish's iterative algorithm that computes the cost of the single-source...

chapter

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

A. Nukada, Y. Ogata, T. Endo, S. Matsuoka

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Most GPU performance ldquohypesrdquo have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being...

article

GPULib: GPU Computing in High-Level Languages

P. Messmer, P.J. Mullowney, B.E. Granger

Computing in Science & Engineering > 2008 > 10 > 5 > 70 - 73

GPULib helps scientists and engineers take advantage of GPUs from within high-level programming environments without requiring any detailed knowledge of the GPU architecture.

Data set:
ieee
Keywords:
KERNEL
GPU
ARRAYS
Publication language:
English

Publication date

Set your own date range

INFONA - science communication portal

Search results

A Data Communication Scheduler for Stream Programs on CPU-GPU Platform

Scalable and Parallel Implementation of a Financial Application on a GPU: With Focus on Out-of-Core Case

Improving the Performance of the Sparse Matrix Vector Product with GPUs

GPU-accelerated synthetic aperture radar backprojection in CUDA

Simulating anomalous diffusion on graphics processing units

Solving 2D Nonlinear Unsteady Convection-Diffusion Equations on Heterogenous Platforms with Multiple GPUs

An Efficient GPU Implementation for Large Scale Individual-Based Simulation of Collective Behavior

Leveraging Computation Sharing and Parallel Processing in Location-Based Services

Multi-agent traffic simulation with CUDA

A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

GPULib: GPU Computing in High-Level Languages

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options