Search results

Items from 21 to 40 out of 52 results

chapter

GPU accelerated blood flow computation using the Lattice Boltzmann Method

Cosmin Nita, Lucian Mihai Itu, Constantin Suciu, Constantin Suciu

2013 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2013 IEEE High Performance Extreme Computing Conference (HPEC)

We propose a numerical implementation based on a Graphics Processing Unit (GPU) for the acceleration of the execution time of the Lattice Boltzmann Method (LBM). The study focuses on the application of the LBM for patient-specific blood flow computations, and hence, to obtain higher accuracy, double precision computations are employed. The LBM specific operations are grouped into two kernels, whereas...

chapter

Accelerating a novel particle-based fluid simulation on the GPU

Zhilu Chen, James Kingsley, Xinming Huang, Erkan Tuzel

2013 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2013 IEEE High Performance Extreme Computing Conference (HPEC)

Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids [1], [2], such as binary and ternary mixtures [3], and polymer solutions [4]-[6], in either two or three dimensions. Although SRD is efficient compared to traditional methods, it is still computationally expensive for large system sizes, e.g. when using a large array of particles...

chapter

A Checkpoint/Restart Scheme for CUDA Applications with Complex Memory Hierarchy

Xinyuan Guo, Hai Jiang, Kuan-Ching Li

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 247 - 252

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Checkpoint/restart has been an effective mechanism to achieve fault tolerance for many scientific applications. However, as GPU becomes a much bigger role in high performance computing, there is no effective checkpoint/restart scheme yet due to GPU's batch-mode execution manner. The paper proposes an application-level checkpoint/restart scheme to save and restore GPU computation states. A precompiler...

chapter

Experimental framework for searching large RDF on GPUs based on key-value storage

Chidchanok Choksuchat, Chantana Chantrapornchai

The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) > 171 - 176

2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Resource Description Framework (RDF) is commonly used for the semantic web query. During this decade, due to big data processing, the large numbers of RDF triples are crawled. The triples usually stored distributed on the clouds storage or the large clusters. To search for the query answer, it is usually difficult to handle the search across platforms. Also, the search takes a long executed time....

chapter

High performance multi-dimensional (2D/3D) FFT-Shift implementation on Graphics Processing Units (GPUs)

Marwan Abdellah, Salah Saleh, Ayman Eldeib, Amr Shaarawi

2012 Cairo International Biomedical Engineering Conference (CIBEC) > 171 - 174

2012 Cairo International Biomedical Engineering Conference (CIBEC)

Frequency domain analysis is one of the most common analysis techniques in signal and image processing. Fast Fourier Transform (FFT) is a well know tool used to perform such analysis by obtaining the frequency spectrum for time- or spatial-domain signals and vice versa. FFT-Shift is a subsequent operation used to handle the resulting arrays from this stage as it centers the DC component of the resulting...

chapter

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

John Jenkins, James Dinan, Pavan Balaji, Nagiza F. Samatova, more

2012 IEEE International Conference on Cluster Computing > 468 - 476

2012 IEEE International Conference on Cluster Computing (CLUSTER)

Lack of efficient and transparent interaction with GPU data in hybrid MPI+GPU environments challenges GPU acceleration of large-scale scientific computations. A particular challenge is the transfer of noncontiguous data to and from GPU memory. MPI implementations currently do not provide an efficient means of utilizing data types for noncontiguous communication of data in GPU memory. To address this...

chapter

Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs

Peng Di, Ding Ye, Yu Su, Yulei Sui, more

2012 41st International Conference on Parallel Processing > 350 - 359

2012 41st International Conference on Parallel Processing (ICPP)

Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the polyhedral model to extract parallelism from a loop nest by applying a sequence of affine transformations to the loop nest. However, how to automate this process to exploit both intra and inter-SM parallelism for GPUs remains a...

chapter

Preemption of a CUDA Kernel Function

Jon Calhoun, Hai Jiang

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 247 - 252

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD)

As graphics processing units (GPUs) gain adoption as general purpose parallel compute devices, several key problems need to be addressed in order for their use to become more practical and more user friendly. One such problem is special functions designed to execute on GPUs called kernel functions are non-preempt able. Once the kernel is issued to the GPU it will remain there till either execution...

chapter

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Yongchao Liu, Bertil Schmidt

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 684 - 690

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between...

chapter

Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs

Daichi Mukunoki, Daisuke Takahashi

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1378 - 1386

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We implemented and evaluated the triple precision Basic Linear Algebra Subprograms (BLAS) subroutines, AXPY, GEMV and GEMM on a Tesla C2050. In this paper, we present a Double Single (D+S) type triple precision floating-point value format and operations. They are based on techniques similar to Double-Double (DD) type quadruple precision operations. On the GPU, the D+S-type operations are more costly...

chapter

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Jie Chen, Balint Joo, William Watson III, Robert Edwards

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2359 - 2368

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the last few years, many scientific applications have been developed for powerful graphics processing units (GPUs) and have achieved remarkable speedups. This success can be partially attributed to high performance host callable GPU library routines that are offloaded to GPUs at runtime. These library routines are based on C/C++-like programming toolkits such as CUDA from NVIDIA and have the same...

chapter

Image Authentication Algorithm on GPU

P.L.V. Vihari, Manoj Mishra

2012 International Conference on Communication Systems and Network Technologies > 874 - 878

2012 International Conference on Communication Systems and Network Technologies (CSNT)

As the demand for research on Image/ Content authentication has significantly increased, many authentication schemes have been proposed so far. But most of them are time consuming. This research concentrates on decreasing the time needed by an Image authentication algorithm. In this paper, we have shown a CUDA-based implementation of content authentication algorithm with NVIDIA's GeForce 8400 GS GPU...

chapter

Implementation of graph algorithms over GPU: A comparative analysis

Swarish Dashora, Nilay Khare

2012 IEEE Students' Conference on Electrical, Electronics and Computer Science > 1 - 8

2012 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

GPU (Graphics Processing Unit) provides high computational speed at a very low cost as compared to high end systems. The field of parallel processing using GPU is advancing very fast with a new technology being introduced in the field every day. With such advancements, it is necessary to review the major works done in this field. Graph traversal is one of the major challenges in this field. So far...

chapter

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Marwa Elteir, Heshan Lin, Wu-Chun Feng

2011 IEEE International Conference on Cluster Computing > 234 - 243

2011 IEEE International Conference on Cluster Computing (CLUSTER)

Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic operations has improved substantially...

chapter

Multiphase LBM Distributed over Multiple GPUs

Carlos Rosales

2011 IEEE International Conference on Cluster Computing > 1 - 7

2011 IEEE International Conference on Cluster Computing (CLUSTER)

A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm...

chapter

A Highly Scalable Solution of an NP-Complete Problem Using CUDA

S Islam, R Tandon, S Singh, A Misra

2011 Sixth International Symposium on Parallel Computing in Electrical Engineering > 93 - 98

2011 6th International Symposium on Parallel Computing in Electrical Engineering (PARELEC 2011)

NP Complete problems are one of the most complex problems in computer science but their vast applications in real world always pushes the scientists to explore new ways to solve them. We extended the original problem definition of Boolean Satisfiability Problem to finding all satisfiable solutions of a given problem instance and used massively parallel architecture of CUDA (Compute Unified Device...

chapter

GPU-S2S: A Compiler for Source-to-Source Translation on GPU

Dan Li, Haijun Cao, Xiaoshe Dong, Bao Zhang

2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming > 144 - 148

Third International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010)

CUDA facilitates the development of General Purpose computing on Graphics Processing Units (GPGPU), however, its complex memory system, thread-level structure, and data transmission control between memories have brought great challenges for programming on GPU. In order to facilitate the development of parallel programs on GPU and reuse existing sequential codes, in this paper we propose a novel directive...

chapter

Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format

Wei Cao, Lu Yao, Zongzhe Li, Yongxian Wang, more

2010 International Conference on Computer Application and System Modeling (ICCASM 2010) > 11 > V11-161 - V11-165

2010 International Conference on Computer Application and System Modeling (ICCASM 2010)

The Sparse Matrix-Vector product (SpMV) is a key operation in engineering and scientific computing. Methods for efficiently implementing it in parallel are critical to the performance of many applications. Modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA's CUDA, have gained interest as a viable architecture for data-parallel general...

chapter

Exploiting Parallelism in Iterative Irregular Maxflow Computations on GPU Accelerators

S Solomon, P Thulasiraman, R K Thulasiram

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) > 297 - 304

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010)

The Graphics Processing Unit (GPU) is an asymmetric, heterogeneous multi-core architecture that can be used for high performance parallel computing applications. However, a significant level of interest has been focused on algorithms for solving regular problems, as these applications typically map well to the GPU. Irregular applications, which rely on pointer or graph-based data structures, have...

chapter

Parallelization of binary and real-coded genetic algorithms on GPU using CUDA

Ramnik Arora, Rupesh Tulshyan, Kalyanmoy Deb

IEEE Congress on Evolutionary Computation > 1 - 8

2010 IEEE Congress on Evolutionary Computation

Genetic Algorithms(GAs) are suitable for parallel computing since population members fitness maybe evaluated in parallel. Most past parallel GA studies have exploited this aspect, besides resorting to different algorithms, such as island, single-population master-slave, fine-grained and hybrid models. A GA involves a number of other operations which, if parallelized, may lead to better parallel GA...

Data set:
ieee
Keywords:
KERNEL
GPU
ARRAYS
Publication language:
English

Publication date

Set your own date range

INFONA - science communication portal

Search results

GPU accelerated blood flow computation using the Lattice Boltzmann Method

Accelerating a novel particle-based fluid simulation on the GPU

A Checkpoint/Restart Scheme for CUDA Applications with Complex Memory Hierarchy

Experimental framework for searching large RDF on GPUs based on key-value storage

High performance multi-dimensional (2D/3D) FFT-Shift implementation on Graphics Processing Units (GPUs)

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs

Preemption of a CUDA Kernel Function

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Image Authentication Algorithm on GPU

Implementation of graph algorithms over GPU: A comparative analysis

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Multiphase LBM Distributed over Multiple GPUs

A Highly Scalable Solution of an NP-Complete Problem Using CUDA

GPU-S2S: A Compiler for Source-to-Source Translation on GPU

Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format

Exploiting Parallelism in Iterative Irregular Maxflow Computations on GPU Accelerators

Parallelization of binary and real-coded genetic algorithms on GPU using CUDA

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options