Search results

chapter

Nuclear Fusion Simulation Code Optimization and Performance Evaluation on GPU Cluster

Norihisa Fujita, Hideo Nuga, Taisuke Boku, Yasuhiro Idomura

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1266 - 1274

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

The conservative global gyrokinetic toroidal full-f five-dimensional Vlasov simulation (GT5D) is a nuclear fusion simulation program designed to analyze turbulence phenomena in tokamak plasma. In this research, we optimize it for graphics processing unit (GPU) clusters with multiple GPUs on each node. Based on the profile results of a GT5D on a CPU node, it was decided to offload the entire time development...

chapter

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Felix Schmitt, Robert Dietrich, Guido Juckeland

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 908 - 915

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing...

article

Interactive Mesostructures withVolumetric Collisions

Scott Nykl, Chad Mourning, David Chelberg

IEEE Transactions on Visualization and Computer Graphics > 2014 > 20 > 7 > 970 - 982

This paper presents a technique for interactively colliding with and deforming mesostructures at a per-texel level. It is compatible with a broad range of existing mesostructure rendering techniques including both safe and unsafe ray-height field intersection algorithms. This technique is able to replace traditional 3D geometrical deformations (vertex-based) with 2D image space operations (pixel-based)...

chapter

A Fast Runtime Visualization of a GPU-Based 3D-FDTD Electromagnetic Simulation

Kota Aoki, Keisuke Dohi, Yuichiro Shibata, Kiyoshi Oguri, more

2013 First International Symposium on Computing and Networking > 30 - 37

2013 First International Symposium on Computing and Networking (CANDAR)

In this paper, we present design and implementation of a fast runtime visualizer for a GPU-based 3D-FDTD electromagnetic simulation. We focus on improving the productivity of simulator development without compromising simulation performance. In order to keep the portability, we implemented a visualizer with the MVC model, where simulation kernels and visualization process were completely separated...

chapter

Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency

Jianmin Chen, Xi Tao, Zhen Yang, Jih-Kwon Peir, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 441 - 451

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Modern General-Purpose computation on Graphics Processing Units (GPGPUs) explore parallelism in applications by building massively parallel architecture and apply multithreading technology to hide the instruction and memory latencies. Such architectures become increasingly popular for parallel applications using CUDA/OpenCL programming languages. In this paper, we investigate thread scheduling algorithms...

chapter

Parallel pattern mining on Graphics Processing Units

Krzysztof Hryniow

Proceedings of the 14th International Carpathian Control Conference (ICCC) > 134 - 139

2013 14th International Carpathian Control Conference (ICCC)

Frequent pattern mining is a field with many practical applications, where large computational power and speed are needed. Many state-of-the-art frequent pattern mining applications are an inefficient solutions for both shared memory and multiprocessor systems due to problems with parallelism and memory. One of possible solutions to the problem is the use of Graphics Processing Unit (GPU) in the system...

chapter

Phase-Based Profiling in GPGPU Kernels

Robert Dietrich, Felix Schmitt, Rene Widera, Michael Bussmann

2012 41st International Conference on Parallel Processing Workshops > 414 - 423

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to...

chapter

GPU-based Real-time Decoding Technique for High-definition Videos

Huifang Deng, Chunhui Deng, Jingjing Li

2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing > 186 - 190

2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP)

In this paper, we first discussed the video decoding standard and its architecture, and then analyzed the decoding complexity of each process. By using the benefit of the CUDA programming model, and taking advantages of GPU to optimize the decoding process of MC (motion compensation) and CSC(color space conversion) that are very time consuming, we proposed a MC accelerating method based on CUDA, and...

chapter

Algorithmic strategies for optimizing the parallel reduction primitive in CUDA

Pedro J. Martin, Luis F. Ayuso, Roberto Torres, Antonio Gavilanes

2012 International Conference on High Performance Computing & Simulation (HPCS) > 511 - 519

2012 International Conference on High Performance Computing & Simulation (HPCS)

Many general-purpose applications exploit Graphics Processing Units (GPUs) by executing a set of well-known dataparallel primitives. Those primitives are usually invoked from the host many times, so their throughput has a great impact on the performance of the overall system. Thus, the study of novel algorithmic strategies to optimize their implementation on current devices is an interesting topic...

chapter

Ice Simulation Using GPGPU

Shadi Alawneh, Dennis Peters

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 425 - 431

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Simulation of the behaviour of a ship operating in pack ice is a computationally intensive process to which General Purpose Computing on Graphical Processing Units (GPGPU) can be applied. In this paper we present an efficient parallel implementation of such a simulator developed using the NVIDIA Compute Unified Device Architecture (CUDA). We have conducted an experiment to measure the relative performance...

chapter

Directive-based Programming for GPUs: A Comparative Study

Ruym'n Reyes, Ivan Lopez, Juan J. Fumero, Francisco de Sande

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 410 - 417

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...

chapter

Towards the Design of Systolic Genetic Search

Martin Pedemonte, Enrique Alba, Francisco Luna

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1778 - 1786

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

This paper elaborates on a new, fresh parallel optimization algorithm specially engineered to run on Graphic Processing Units (GPUs). The underlying operation relates to Systolic Computation. The algorithm, called Systolic Genetic Search (SGS) is based on the synchronous circulation of solutions through a grid of processing units and tries to profit from the parallel architecture of GPUs. The proposed...

chapter

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation

Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1696 - 1702

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme...

chapter

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant V. Kale

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2404 - 2413

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism...

chapter

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization

Zheng Cui, Yun Liang, Kyle Rupnow, Deming Chen

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 83 - 94

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graphics processing units (GPUs) are increasingly critical for general-purpose parallel processing performance. GPU hardware is composed of many streaming multiprocessors, each of which employs the single-instruction multiple-data (SIMD) execution style. This massively parallel architecture allows GPUs to execute tens of thousands of threads in parallel. Thus, GPU architectures efficiently execute...

chapter

A Technique for Collision Detection and 3D Interaction Based on Parallel GPU and CPU Processing

Fernando Tsuda, Ricardo Nakamura

2011 Brazilian Symposium on Games and Digital Entertainment > 36 - 42

2011 Brazilian Symposium on Games and Digital Entertainment (SBGAMES)

Efficient collision detection is a requirement for a large number of games. With the release of devices that enable full-body interaction, new challenges arise in this area. In this paper we present a technique for dynamic construction of octrees for collision detection, based on a cloud of points using GPGPU techniques. Since some algorithms are not suitable for the GPU processing model, our technique...

chapter

An OpenMP Compiler for Hybrid CPU/GPU Computing Architecture

Hung-Fu Li, Tyng-Yeu Liang, Jhen-Lin Jiang

2011 Third International Conference on Intelligent Networking and Collaborative Systems > 209 - 216

2011 Third International Conference on Intelligent Networking and Collaborative Systems (INCoS)

Hybrid CPU/GPU computing architecture has received great attention from the researchers of high performance computing. This new architecture provides higher computation performance than that uses only CPUs for data computation. However, the programming on this computing architecture is not easy for programmers since they have to learn the programming APIs of GPU and handle data communication between...

chapter

GPGPU-based data parallel region growing algorithm for cell nuclei detection

Sandor Szenasi, Zoltan Vamossy, Miklos Kozlovszky

2011 IEEE 12th International Symposium on Computational Intelligence and Informatics (CINTI) > 493 - 499

2011 IEEE 12th International Symposium on Computational Intelligence and Informatics (CINTI)

Nowadays microscopic analysis of tissue samples is done more and more by using digital imagery and special immunodiagnostic software. These are typically specific applications developed for one distinct field, but some subroutines are commonly repeated, for example several applications contain steps that can detect cell nuclei in a sample image. The aim of our research is developing a new data parallel...

chapter

Parallel biomedical image processing with GPGPUs in cancer research

Attila Remenyi, Sandor Szenasi, Istvan Bandi, Zoltan Vamossy, more

3rd IEEE International Symposium on Logistics and Industrial Informatics > 245 - 248

2011 3rd IEEE International Symposium on Logistics and Industrial Informatics (LINDI 2011)

The main aim of this work is to show, how GPGPUs can facilitate certain type of image processing methods. The software used in this paper is used to detect special tissue part, the nuclei on (HE - hematoxilin eosin) stained colon tissue sample images. Since pathologists are working with large number of high resolution images - thus require significant storage space -, one feasible way to achieve reasonable...

chapter

Fast map projection on CUDA

Yanwei Zhao, Zhenlin Cheng, Hui Dong, Jinyun Fang, more

2011 IEEE International Geoscience and Remote Sensing Symposium > 4066 - 4069

IGARSS 2011 - 2011 IEEE International Geoscience and Remote Sensing Symposium

Map projection is a key task in cartography that transforms the geographical coordinates from one coordinate system to another. It has been widely used in the Geographic Information System application. However, map projection is a very time-consuming task, and fast processing speed is often required in interactive GIS scenarios. Parallel computation provides an opportunity to reduce run times. Nowadays,...

INFONA - science communication portal

Search results

Nuclear Fusion Simulation Code Optimization and Performance Evaluation on GPU Cluster

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Interactive Mesostructures withVolumetric Collisions

A Fast Runtime Visualization of a GPU-Based 3D-FDTD Electromagnetic Simulation

Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency

Parallel pattern mining on Graphics Processing Units

Phase-Based Profiling in GPGPU Kernels

GPU-based Real-time Decoding Technique for High-definition Videos

Algorithmic strategies for optimizing the parallel reduction primitive in CUDA

Ice Simulation Using GPGPU

Directive-based Programming for GPUs: A Comparative Study

Towards the Design of Systolic Genetic Search

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization

A Technique for Collision Detection and 3D Interaction Based on Parallel GPU and CPU Processing

An OpenMP Compiler for Hybrid CPU/GPU Computing Architecture

GPGPU-based data parallel region growing algorithm for cell nuclei detection

Parallel biomedical image processing with GPGPUs in cancer research

Fast map projection on CUDA

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options