Search results

Items from 1 to 20 out of 55 results

chapter

EMA: Turning Multiple Address Spaces Transparent to CUDA Programming

Kun Tang, Yulong Yu, Yuxin Wang, Yong Zhou, more

2012 Seventh ChinaGrid Annual Conference > 170 - 175

2012 Seventh ChinaGrid Annual Conference (ChinaGrid)

CUDA performs general purpose parallel computing using GPGPU, which has been applied to various computing fields. However, the multi-address-space architecture in CUDA makes memory management complicated. NVIDIA introduced UVA, Unified Virtual Addressing, into CUDA Toolkit 4.0 to address this issue. However, UVA has platform limitations and even performance loss under certain circumstances. We propose...

chapter

The JavaSymphony Extensions for Parallel GPU Computing

Muhammad Aleem, Radu Prodan, Thomas Fahringer

2012 41st International Conference on Parallel Processing > 30 - 39

2012 41st International Conference on Parallel Processing (ICPP)

Today, the use of GPUs as coprocessors to accelerate high-performance scientific applications is becoming an important practice. Still, some of the high-level programming languages such as Java require extensions or new interfaces for utilising the huge parallelism of these new devices. In this paper, we propose extensions to an existing Java-based programming and parallel computing environment called...

chapter

Directive-based Programming for GPUs: A Comparative Study

Ruym'n Reyes, Ivan Lopez, Juan J. Fumero, Francisco de Sande

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 410 - 417

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...

chapter

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Shucai Xiao, Wu-chun Feng

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2554 - 2557

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Graphics Processing Units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging...

chapter

dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

Philipp Kegel, Michel Steuwer, Sergei Gorlatch

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 174 - 186

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we presentd OpenCL (Distributed...

chapter

Implementation of XcalableMP Device Acceleration Extention with OpenCL

Takuma Nomizu, Daisuke Takahashi, Jinpil Lee, Taisuke Boku, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2394 - 2403

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL,...

chapter

Generating Device-specific GPU Code for Local Operators in Medical Imaging

Richard Membarth, Frank Hannig, Jurgen Teich, Mario Korner, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 569 - 581

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domain-specific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler...

chapter

Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

Michel Steuwer, Philipp Kegel, Sergei Gorlatch

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1858 - 1865

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches - CUDA and OpenCL - are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper is built on top of the OpenCL standard and offers pre-implemented recurring computation and communication patterns (skeletons)...

chapter

Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture

Tyng-Yeu Liang, Hung-Fu Li, Jun-Yao Chiu

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2369 - 2377

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Hybrid CPU/GPU computing architecture recently has become an alternative platform for high performance computing. This architecture provides massive computational power with lower energy consumption and less economic cost than the traditional one using only CPUs. However, the complexity of the GPU programming is too high for users to move their applications toward this hybrid computing architecture...

chapter

Productive Programming of GPU Clusters with OmpSs

Javier Bueno, Judit Planas, Alejandro Duran, Rosa M. Badia, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 557 - 568

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Clusters of GPUs are emerging as a new computational scenario. Programming them requires the use of hybrid models that increase the complexity of the applications, reducing the productivity of programmers. We present the implementation of OmpSs for clusters of GPUs, which supports asynchrony and heterogeneity for task parallelism. It is based on annotating a serial application with directives that...

chapter

Image Authentication Algorithm on GPU

P.L.V. Vihari, Manoj Mishra

2012 International Conference on Communication Systems and Network Technologies > 874 - 878

2012 International Conference on Communication Systems and Network Technologies (CSNT)

As the demand for research on Image/ Content authentication has significantly increased, many authentication schemes have been proposed so far. But most of them are time consuming. This research concentrates on decreasing the time needed by an Image authentication algorithm. In this paper, we have shown a CUDA-based implementation of content authentication algorithm with NVIDIA's GeForce 8400 GS GPU...

chapter

Implementation of graph algorithms over GPU: A comparative analysis

Swarish Dashora, Nilay Khare

2012 IEEE Students' Conference on Electrical, Electronics and Computer Science > 1 - 8

2012 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

GPU (Graphics Processing Unit) provides high computational speed at a very low cost as compared to high end systems. The field of parallel processing using GPU is advancing very fast with a new technology being introduced in the field every day. With such advancements, it is necessary to review the major works done in this field. Graph traversal is one of the major challenges in this field. So far...

chapter

Parallelization of Virtual Screening in Drug Discovery on Massively Parallel Architectures

Gines D. Guerrero, Horacio E. Perez-S´nchez, Jose M. Cecilia, Jose M. Garcia

2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing > 588 - 595

2012 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

The current trend in medical research for the discovery of new drugs is the use of Virtual Screening (VS) methods. In these methods, the calculation of the non-bonded interactions, such as electrostatics or van der Waals forces, plays an important role, representing up to 80% of the total execution time. These kernels are computational intensive and massively parallel in nature, and thus they are...

chapter

A Runtime Library for Platform-Independent Task Parallelism

Panagiotis E. Hadjidoukas, Evaggelos Lappas, Vassilios V. Dimakopoulos

2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing > 229 - 236

2012 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

With the increasing diversity of computing systems and the rapid performance improvement of commodity hardware, heterogeneous clusters become the dominant platform for low-cost, high-performance computing. Grid-enabled and heterogeneous implementations of MPI establish it as the de facto programming model for these environments. On the other hand, task parallelism provides a natural way for exploiting...

chapter

Hierarchically characterizing CUDA program behavior

Zhibin Yu, Hai Jin, Nilanjan Goswami, Tao Li, more

2011 IEEE International Symposium on Workload Characterization (IISWC) > 76

2011 IEEE International Symposium on Workload Characterization (IISWC)

CUDA has become a very popular programming paradigm in parallel computing area. However, very little work has been done for characterizing CUDA kernels. In this work, we measure the thread level performance, collect the basic block level characteristics, and glean the instruction level properties for about 35 programs from CUDA SDK, Parboil, and Rodinia benchmark suites. In addition, we define basic...

chapter

High performance Iris Recognition System on GPU

Fatma Zaky Sakr, Mohammed Taher, Ayman M. Wahba

The 2011 International Conference on Computer Engineering & Systems > 237 - 242

2011 International Conference on Computer Engineering & Systems (ICCES)

Iris Recognition stands out as one of the most accurate biometric methods in use today. However, the iris recognition algorithms are currently implemented on general purpose sequential processing systems, such as generic central processing units (CPUs). In this work, we presented a more direct and parallel processing alternative using the graphics processing unit (GPU), which originally was used exclusively...

chapter

An OpenMP Compiler for Hybrid CPU/GPU Computing Architecture

Hung-Fu Li, Tyng-Yeu Liang, Jhen-Lin Jiang

2011 Third International Conference on Intelligent Networking and Collaborative Systems > 209 - 216

2011 Third International Conference on Intelligent Networking and Collaborative Systems (INCoS)

Hybrid CPU/GPU computing architecture has received great attention from the researchers of high performance computing. This new architecture provides higher computation performance than that uses only CPUs for data computation. However, the programming on this computing architecture is not easy for programmers since they have to learn the programming APIs of GPU and handle data communication between...

chapter

Programming Strategies for GPUs and their Power Consumption

Sayan Ghosh, Barbara Chapman

2011 International Conference on Parallel Architectures and Compilation Techniques > 218

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

GPUs are slowly becoming ubiquitous devices in high performance computing. Nvidia's newly released version 4.0 of the CUDA API[2] for GPU programming offers multiple ways to program on GPUs and emphasizes on Multi-GPU environments which are common in modern day compute clusters. However, despite of the subsequent progress in FLOP counts, the bane of large scale computing systems have been increased...

chapter

Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU

Ziyu Guo, Eddy Zheng Zhang, Xipeng Shen

2011 International Conference on Parallel Architectures and Compilation Techniques > 310 - 319

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Automatic compilation for multiple types of devices is important, especially given the current trends towards heterogeneous computing. This paper concentrates on some issues in compiling fine-grained SPMD-threaded code (e.g., GPU CUDA code) for multicore CPUs. It points out some correctness pitfalls in existing techniques, particularly in their treatment to implicit synchronizations. It then describes...

chapter

Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems

Long Chen, Oreste Villa, Guang R. Gao

2011 IEEE International Conference on Cluster Computing > 386 - 394

2011 IEEE International Conference on Cluster Computing (CLUSTER)

Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. However, when using multiple GPUs concurrently, the conventional data parallel GPU programming paradigms, e.g., CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine grained computation with communication, etc. In this paper, we present a...

Content availability:
Available
Data set:
ieee
Keywords:
KERNEL
GRAPHICS PROCESSING UNIT
PROGRAMMING

Publication date

Set your own date range

Keywords

INSTRUCTION SETS (23)
COMPUTER GRAPHIC EQUIPMENT (22)
COPROCESSORS (22)
CUDA (19)
GPU (19)
PARALLEL PROCESSING (14)
COMPUTATIONAL MODELING (11)
COMPUTER ARCHITECTURE (10)
GPGPU (7)
GRAPHICS PROCESSING UNITS (7)
OPENMP (7)
OPTIMIZATION (7)
PERFORMANCE EVALUATION (7)
ARRAYS (6)
COMPUTE UNIFIED DEVICE ARCHITECTURE (6)
OPENCL (6)
PARALLEL ARCHITECTURES (6)
PARALLEL PROGRAMMING (6)
APPLICATION PROGRAM INTERFACES (5)
COMPUTER GRAPHICS (5)
HARDWARE (5)
HIGH PERFORMANCE COMPUTING (5)
MESSAGE SYSTEMS (5)
MPI (5)
REGISTERS (5)
YARN (5)
ACCELERATION (4)
CUDA PROGRAMMING MODEL (4)
GRAPHICS (4)
MESSAGE PASSING (4)
SYNCHRONIZATION (4)
BANDWIDTH (3)
BENCHMARK TESTING (3)
CENTRAL PROCESSING UNIT (3)
DRIVER CIRCUITS (3)
FIELD PROGRAMMABLE GATE ARRAYS (3)
GPU COMPUTING (3)
MATRIX MULTIPLICATION (3)
MEMORY MANAGEMENT (3)
MULTI-GPU (3)
MULTI-THREADING (3)
MULTICORE PROCESSING (3)
MULTIPROCESSING SYSTEMS (3)
NVIDIA GPU (3)
OPTIMISATION (3)
VECTORS (3)
ACCELERATORS (2)
ALGORITHM DESIGN AND ANALYSIS (2)
BIOMEDICAL IMAGING (2)
CLOCK FREQUENCY (2)
CLUSTER (2)
CODE GENERATION (2)
COMPILER OPTIMIZATIONS (2)
CONTEXT (2)
CRYPTOGRAPHY (2)
EDUCATIONAL INSTITUTIONS (2)
ENERGY CONSUMPTION (2)
FAST FOURIER TRANSFORM (2)
FAST FOURIER TRANSFORMS (2)
FPGA PROGRAMMING (2)
GENERAL PURPOSE COMPUTATION (2)
GENERAL PURPOSE COMPUTERS (2)
GENERAL PURPOSE GRAPHICS PROCESSING UNITS (2)
GPU ARCHITECTURE (2)
GPU CLUSTERS (2)
GPU PROGRAMMING (2)
HETEROGENEOUS SYSTEMS (2)
INDEXES (2)
JAVA (2)
LIBRARIES (2)
NVIDIA (2)
NVIDIA CUDA (2)
PARALLEL COMPUTING (2)
PROGRAM COMPILERS (2)
RECONFIGURABLE ARCHITECTURES (2)
RESOURCE ALLOCATION (2)
RUNTIME (2)
SERVERS (2)
SOFTWARE ARCHITECTURE (2)
STANDARDS (2)
3D GRAPHICS (1)
4-NODE MULTIGPU CLUSTER (1)
ACCELERATOR (1)
ACCELERATOR-BASED HETEROGENEOUS PARALLEL SYSTEMS (1)
AES (1)
AES KEY GENERATION PROCESSING (1)
ALGORITHMIC SKELETONS (1)
ALGORITHMS (1)
AMD OPTERON DUAL CORE CPU (1)
ANOMALOUS DIFFUSION SIMULATION PROCESS (1)
API (1)
APPLICATION PROGRAM INTERFACE (1)
ASPECT-ORIENTED PROGRAMMING (1)
ASPECT-WEAVING COMPILER (1)
ASYNCHRONOUS COMMUNICATION (1)
ATLAS-BASED CPU VERSION (1)
AUTHENTICATION (1)
more

INFONA - science communication portal

Search results

EMA: Turning Multiple Address Spaces Transparent to CUDA Programming

The JavaSymphony Extensions for Parallel GPU Computing

Directive-based Programming for GPUs: A Comparative Study

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

Implementation of XcalableMP Device Acceleration Extention with OpenCL

Generating Device-specific GPU Code for Local Operators in Medical Imaging

Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture

Productive Programming of GPU Clusters with OmpSs

Image Authentication Algorithm on GPU

Implementation of graph algorithms over GPU: A comparative analysis

Parallelization of Virtual Screening in Drug Discovery on Massively Parallel Architectures

A Runtime Library for Platform-Independent Task Parallelism

Hierarchically characterizing CUDA program behavior

High performance Iris Recognition System on GPU

An OpenMP Compiler for Hybrid CPU/GPU Computing Architecture

Programming Strategies for GPUs and their Power Consumption

Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU

Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options