Search results

Items from 1 to 19 out of 19 results

chapter

Phase-Based Profiling in GPGPU Kernels

Robert Dietrich, Felix Schmitt, Rene Widera, Michael Bussmann

2012 41st International Conference on Parallel Processing Workshops > 414 - 423

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to...

chapter

Directive-based Programming for GPUs: A Comparative Study

Ruym'n Reyes, Ivan Lopez, Juan J. Fumero, Francisco de Sande

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 410 - 417

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...

chapter

Rootbeer: Seamlessly Using GPUs from Java

Philip C. Pratt-Szeliga, James W. Fawcett, Roy D. Welch

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 375 - 380

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

When converting a serial program to a parallel program that can run on a Graphics Processing Unit (GPU) the developer must choose what functions will run on the GPU. For each function the developer chooses, he or she needs to manually write code to: 1) serialize state to GPU memory, 2) define the kernel code that the GPU will execute, 3) control the kernel launch and 4) deserialize state back to CPU...

chapter

GPU-based Cloud computing for comparing the structure of protein binding sites

Matthias Leinweber, Lars Baumgartner, Marco Mernberger, Thomas Fober, more

2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST) > 1 - 6

2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST 2012) - Complex Environment Engineering

In this paper, we present a novel approach for using a GPU-based Cloud computing infrastructure to efficiently perform a structural comparison of protein binding sites. The original CPU-based Java version of a recent graph-based algorithm called SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been...

chapter

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Yongchao Liu, Bertil Schmidt

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 684 - 690

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between...

chapter

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant V. Kale

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2404 - 2413

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism...

chapter

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Jie Chen, Balint Joo, William Watson III, Robert Edwards

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2359 - 2368

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the last few years, many scientific applications have been developed for powerful graphics processing units (GPUs) and have achieved remarkable speedups. This success can be partially attributed to high performance host callable GPU library routines that are offloaded to GPUs at runtime. These library routines are based on C/C++-like programming toolkits such as CUDA from NVIDIA and have the same...

chapter

Productive Programming of GPU Clusters with OmpSs

Javier Bueno, Judit Planas, Alejandro Duran, Rosa M. Badia, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 557 - 568

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Clusters of GPUs are emerging as a new computational scenario. Programming them requires the use of hybrid models that increase the complexity of the applications, reducing the productivity of programmers. We present the implementation of OmpSs for clusters of GPUs, which supports asynchrony and heterogeneity for task parallelism. It is based on annotating a serial application with directives that...

chapter

Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures

Naila Farooqui, Andrew Kerr, Greg Eisenhauer, Karsten Schwan, more

2012 IEEE International Symposium on Performance Analysis of Systems & Software > 58 - 67

2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)

As parallel execution platforms continue to proliferate, there is a growing need for real-time introspection tools to provide insight into platform behavior for performance debugging, correctness checks, and to drive effective resource management schemes. To address this need, we present the Lynx dynamic instrumentation system. Lynx provides the capability to write instrumentation routines that are...

chapter

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

Xiaohui Cui, Jesse St. Charles, Thomas E. Potok

2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery > 220 - 225

2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

Analyzing and clustering large scale data set is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of data clustering is its complexity $O(n^2)$. As the number of data and feature dimensions grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last...

chapter

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

George Bosilca, Aurelien Bouteiller, Thomas Herault, Pierre Lemarinier, more

2011 IEEE International Conference on Cluster Computing > 395 - 402

2011 IEEE International Conference on Cluster Computing (CLUSTER)

Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with non-uniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present...

chapter

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators

Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Mathieu Faverge, more

2011 IEEE International Parallel & Distributed Processing Symposium > 932 - 943

2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

One of the major trends in the design of exascale architectures is the use of multicore nodes enhanced with GPU accelerators. Exploiting all resources of a hybrid accelerators-based node at their maximum potential is thus a fundamental step towards exascale computing. In this article, we present the design of a highly efficient QR factorization for such a node. Our method is in three steps. The first...

chapter

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications

Hiroyuki Takizawa, Kentaro Koyama, Katsuto Sato, Kazuhiko Komatsu, more

2011 IEEE International Parallel & Distributed Processing Symposium > 864 - 876

2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

In this paper, we propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional check pointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another...

chapter

Patterns of Inefficient Performance Behavior in GPU Applications

D Eschweiler, D Becker, F Wolf

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 262 - 266

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

Writing efficient software for heterogeneous architectures equipped with modern accelerator devices presents a serious challenge to programmer productivity, creating a need for powerful performance-analysis tools to adequately support the software development process. To guide the design of such tools, we describe typical patterns of inefficient runtime behavior that may adversely affect the performance...

chapter

Fast multipole method on GPU: Tackling 3-D capacitance extraction on massively parallel SIMD platforms

Xueqian Zhao, Zhuo Feng

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC) > 558 - 563

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC)

To facilitate full chip capacitance extraction, field solvers are typically deployed for characterizing capacitance libraries for various interconnect structures and configurations. In the past decades, various algorithms for accelerating boundary element methods (BEM) have been developed to improve the efficiency of field solvers for capacitance extraction. This paper presents the first massively...

chapter

GPU-assisted malware

G Vasiliadis, M Polychronakis, S Ioannidis

2010 5th International Conference on Malicious and Unwanted Software > 1 - 6

2010 5th International Conference on Malicious and Unwanted Software (MALWARE 2010)

Malware writers constantly seek new methods to obfuscate their code so as to evade detection by virus scanners. Two code-armoring techniques that pose significant challenges to existing malicious-code detection and analysis systems are unpacking and run-time polymorphism. In this paper, we demonstrate how malware can increase its robustness against detection by taking advantage of the ubiquitous Graphics...

chapter

Hybrid OpenCL: Connecting Different OpenCL Implementations over Network

R Aoki, S Oikawa, R Tsuchiyama, T Nakamura

2010 10th IEEE International Conference on Computer and Information Technology > 2729 - 2735

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

We are developing Hybrid OpenCL, which enables the connection between different OpenCL implementations over the network. Hybrid OpenCL consists of two elements, a runtime system that provides the abstraction of different OpenCL implementations and a bridge program that connects multiple OpenCL runtime systems over the network. Hybrid OpenCL enables the construction of the scalable OpenCL environments...

chapter

Accelerating spatial clustering detection of epidemic disease with graphics processing unit

Sisi Zhao, Chenghu Zhou

2010 18th International Conference on Geoinformatics > 1 - 6

2010 18th International Conference on Geoinformatics

The statistics of disease clustering is of interest to epidemiologists. In order to detect spatial clustering of disease in all the regions of China, we adopted a likelihood ratio based method which utilizes Monte Carlo simulation and spatial exploring to analyze the real time updating data stored in database. However, large number of random tests for Monte Carlo simulation and large scale of the...

chapter

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

José Duato, Antonio J Peña, F Silla, R Mayo, more

2010 International Conference on High Performance Computing&Simulation > 224 - 231

2010 International Conference on High Performance Computing & Simulation (HPCS 2010)

The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable...

Filter options

Data set:
ieee
Keywords:
KERNEL
RUNTIME
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Keywords

CUDA (7)
GPGPU (6)
GPU (6)
INSTRUCTION SETS (6)
ARRAYS (4)
COMPUTER GRAPHIC EQUIPMENT (4)
COPROCESSORS (4)
ACCELERATORS (3)
COMPUTER ARCHITECTURE (3)
HARDWARE (3)
LIBRARIES (3)
OPENCL (3)
CENTRAL PROCESSING UNIT (2)
COMPUTE UNIFIED DEVICE ARCHITECTURE (2)
CONTEXT (2)
HIGH PERFORMANCE COMPUTING (2)
INDEXES (2)
INSTRUMENTS (2)
OPENMP (2)
PARALLEL PROCESSING (2)
PERFORMANCE EVALUATION (2)
PROGRAMMING (2)
TILES (2)
VECTORS (2)
ACCELERATION (1)
ACCELERATOR (1)
ADAPTIVE RUNTIME (1)
AEROSPACE ELECTRONICS (1)
AGGLOMERATION (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ANALYSIS (1)
BENCHMARK TESTING (1)
BIOLOGY COMPUTING (1)
BIRDS (1)
BRIDGES (1)
BURROWS-WHEELER TRANSFORM (1)
C++ (1)
CAPACITANCE (1)
CAPACITANCE EXTRACTION (1)
CAVITY RESONATORS (1)
CHECKPOINTING (1)
CHINA (1)
CLOUD COMPUTING (1)
CLUSTER (1)
CLUSTER PROGRAMMING (1)
CLUSTERING (1)
CLUSTERING ALGORITHMS (1)
CLUSTERING DETECTION (1)
CLUSTERS (1)
CODE-ARMORING TECHNIQUE (1)
COHERENCE (1)
COMPILER (1)
COMPILERS (1)
CONDUCTORS (1)
CRYPTOGRAPHY (1)
CUDA COMPUTE ENGINE (1)
DAG SCHEDULING (1)
DATA PROCESSING TIME (1)
DISEASES (1)
DRIVER CIRCUITS (1)
DYNAMIC SCHEDULING (1)
ENERGY CONSUMPTION (1)
ENERGY SAVING (1)
EPIDEMIC DISEASE (1)
EQUATIONS (1)
EXPRESSION TEMPLATES (1)
FLOCKING (1)
GENERAL PURPOSE COMPUTING (1)
GENERAL-PURPOSE PROCESSORS (1)
GENOMICS (1)
GPGPU COMPUTING (1)
GPU APPLICATIONS (1)
GPU-ASSISTED MALWARE (1)
GPU-BASED ACCELERATORS (1)
GRAIN SIZE (1)
GRAPH ALIGNMENT (1)
GRAPHICS (1)
GRAPHICS HARDWARE (1)
GRAPHICS PROCESSING UNITS (1)
HETEROGENEOUS COMPUTING (1)
HIGH PERFORMANCE CLUSTERS (1)
HPC (1)
HYBRID OPENCL (1)
INEFFICIENT PERFORMANCE BEHAVIOR PATTERN (1)
INVASIVE SOFTWARE (1)
JAVA (1)
JIT (1)
LARGE SCALE (1)
LATTICES (1)
LIKELIHOOD RATIO TEST PROGRAM (1)
LINEAR ALGEBRA (1)
MALICIOUS-CODE DETECTION (1)
MALWARE (1)
MANY-CORE (1)
MATRIX DECOMPOSITION (1)
MEASUREMENT (1)
MEMORY MANAGEMENT (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options