Search results

Items from 1 to 20 out of 35 results

chapter

A programming model and runtime system for approximation-aware heterogeneous computing

Ioannis Parnassos, Nikolaos Bellas, Nikolaos Katsaros, Nikolaos Patsiatzis, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Heterogeneous platforms that include diverse architectures such as multicore CPUs, FPGAs and GPUs are becoming very popular due to their superior performance and energy efficiency. Besides heterogeneity, a promising approach for minimizing energy consumption is through approximate computing which relaxes the requirement that all parts of a program are considered equally important to the output quality,...

chapter

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, more

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

chapter

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Michael Wagner, Victor Lopez, Julian Morillo, Carlo Cavazzoni, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 243 - 250

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping...

chapter

OpenMP device offloading to FPGA accelerators

Lukas Sommer, Jens Korinth, Andreas Koch

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 201 - 205

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Future high-performance computing systems will need to include multiple specialized accelerators in a single heterogeneous system to overcome power-density limitations of CPU performance.

chapter

Runtime Coordinated Heterogeneous Tasks in Charm++

Michael P. Robson, Ronak Buch, Laxmikant V. Kale

2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2) > 40 - 43

2016 Second International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)

Effective utilization of the increasingly heterogeneous hardware in modern supercomputers is a significant challenge. Many applications have seen performance gains by using GPUs, but many implementations leave CPUs sitting idle.In this paper, we describe a runtime managed system for coordinating heterogeneous execution. This system manages data transfers to and from GPU devices and schedules work...

chapter

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences

Khaled Hamidouche, Jie Zhang, Dhabaleswar K. Panda, Karen Tomko

2016 PGAS Applications Workshop (PAW) > 9 - 16

2016 PGAS Applications Workshop (PAW)

PGAS models with a lightweight synchronization and shared memory abstraction, are seen as a good alternative to the Message Passing model for irregular communication patterns. OpenSHMEM is a library based PGAS model. OpenSHMEM 1.3 introduced Non-Blocking data movement operations to provide better asynchronous progress and overlap. In this paper, we present our experiences in designing Non-Blocking...

chapter

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, more

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 10

2016 IEEE International Symposium on Workload Characterization (IISWC)

Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale data parallel workloads, but are considered weak in processing serialized tasks and communicating with other devices. Pursuing a CPU-GPU collaborative computing model which takes advantage of both devices could provide an important breakthrough in realizing the full performance potential of heterogeneous computing...

chapter

CID: A Compile-Time Implementation Decider for Heterogeneous Platforms Based on C++ Attributes

Luis Miguel Sanchez, David del Rio Astorga, Manuel F. Dolz, Javier Fernandez

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) > 1149 - 1156

With the emergence of heterogeneous architectures, the development of parallel software has become an increasingly complex issue. The fact of using multiple programming models targeted to specific devices has turned the implementation process into a challenging task that comes along with a variety of difficulties. In this sense, developers are preoccupied with finding ways to alleviate the burden...

chapter

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

Chih-Chen Kao, Yu-Tsung Miao, Wei-Chung Hsu

2016 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2016 IEEE International Conference on Multimedia and Expo (ICME)

The prevalence of real time multimedia delivery appliances has led to the developments of a variety of efficient architectures and supporting software technologies. Especially, Ray-Tracing, a well-known physically-based rendering algorithm, has been receiving great attention in research and development. Unfortunately, Ray-Tracing algorithm, being one of the irregular applications, suffers from the...

chapter

Open ACC Programs Examined: A Performance Analysis Approach

Robert Dietrich, Guido Juckeland, Michael Wolfe

2015 44th International Conference on Parallel Processing > 310 - 319

2015 44th International Conference on Parallel Processing (ICPP)

The Open ACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or Open CL. Details are implicit in the programming model and managed by Open ACC API-enabled compilers and...

chapter

JolokiaC++: Optimizing Irregular Accesses for GPGPU

Vibha Patel, Sanjeev Aggarwal, Amey Karkare

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 583 - 590

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

We present JolokiaC++ a compiler framework to ease coding of irregular data applications on GPUs. The effectiveness of the compiler and runtime systems of JolokiaC++ is tested using three kernels IRREG, MOLDYN and NBF, executed on NVIDIA GPUs. We developed extensions for the generic parallel constructs that allow portable and efficient programming of codes with irregular accesses on the GPU. We present...

chapter

Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors

Deepak Majeti, Vivek Sarkar

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 708 - 717

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

Heterogeneous architectures with their diverse architectural features impose significant programmability challenges. Existing programming systems involve non-trivial learning and are not productive, not portable, and are challenging to tune for performance. In this paper, we introduce Heterogeneous Habanero-C (H2C), which is an implementation of the Habanero execution model for modern heterogeneous...

chapter

PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

Michael Haidl, Sergei Gorlatch

2014 LLVM Compiler Infrastructure in HPC > 1 - 11

2014 LLVM Compiler Infrastructure in HPC (LLVM-HPC)

We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has...

chapter

Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations

Qing Yi, Qian Wang, Huimin Cui

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 596 - 608

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

General purpose compilers aim to extract the best average performance for all possible user applications. Due to the lack of specializations for different types of computations, compiler attained performance often lags behind those of the manually optimized libraries. In this paper, we demonstrate a new approach, programmable composition, to enable the specialization of compiler optimizations without...

chapter

Power and performance analysis of the Graph 500 benchmark on the Single-chip Cloud Computer

Zhiquan Lai, King Tin Lam, Cho-Li Wang, Jinshu Su

Proceedings of 2014 International Conference on Cloud Computing and Internet of Things > 9 - 13

2014 International Conference on Cloud Computing and Internet of Things (CCIOT)

The concerns of data-intensiveness and energy awareness are actively reshaping the design of high-performance computing (HPC) systems nowadays. The Graph500 is a widely adopted benchmark for evaluating the performance of computing systems for data-intensive workloads. In this paper, we introduce a data-parallel implementation of Graph500 on the Intel Single-chip Cloud Computer (SCC). The SCC features...

chapter

Power and Energy Footprint of OpenMP Programs Using OpenMP Runtime API

Anilkumar Nandamuri, Abid M. Malik, Ahmad Qawasmeh, Barbara M. Chapman

2014 Energy Efficient Supercomputing Workshop > 79 - 88

2014 Energy Efficient Supercomputing Workshop (E2SC)

Power and energy have become dominant aspects of hardware and software design in the High Performance Computing (HPC). Recently, the Department of Defense (DOD) has put a constraint that applications and architectures need to attain 75 GFLOPS/Watt in order to support the future missions. This requires a significant research effort towards power and energy optimization. OpenMP programming model is...

chapter

Optimizing Collective Communication in UPC

Jithin Jose, Khaled Hamidouche, Jie Zhang, Akshay Venkatesh, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 361 - 370

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Message Passing Interface (MPI) has been the defacto programming model for scientific parallel applications. However, data driven applications with irregular communication patterns are harder to implement using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. PGAS languages like UPC are growing in popularity because of...

chapter

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Dmitry Mikushin, Nikolay Likhogrud, Eddy Z. Zhang, Christopher Bergstrom

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1011 - 1020

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained...

chapter

Automatic execution of single-GPU computations across multiple GPUs

Javier Cabezas, Lluis Vilanova, Isaac Geladeno, Thomas B. Jablin, more

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 467 - 468

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

We present AMGE, a programming framework and runtime system to decompose data and GPU kernels and execute them on multiple GPUs concurrently. AMGE exploits the remote memory access capability of recent GPUs to guarantee data accessibility regardless of its physical location, thus allowing AMGE to safely decompose and distribute arrays across GPU memories. AMGE also includes a compiler analysis to...

chapter

HSAIL: Portable compiler IR for HSA

Ben Sander

2013 IEEE Hot Chips 25 Symposium (HCS) > 1 - 32

2013 IEEE Hot Chips 25 Symposium (HCS)

Keywords:
KERNEL
RUNTIME
PROGRAMMING

Publication date

Set your own date range

Keywords

GRAPHICS PROCESSING UNITS (15)
COMPUTER ARCHITECTURE (7)
HARDWARE (7)
BENCHMARK TESTING (6)
LIBRARIES (6)
OPENCL (6)
COMPUTATIONAL MODELING (5)
OPTIMIZATION (5)
PARALLEL PROGRAMMING (5)
CUDA (4)
INSTRUCTION SETS (4)
PARALLEL PROCESSING (4)
PROGRAM PROCESSORS (4)
ACCELERATORS (3)
ALGORITHM DESIGN AND ANALYSIS (3)
ARRAYS (3)
ELECTRONICS PACKAGING (3)
GPU (3)
HETEROGENEOUS ARCHITECTURES (3)
OPENMP (3)
PERFORMANCE ANALYSIS (3)
PERFORMANCE EVALUATION (3)
SYNCHRONIZATION (3)
COMPILER (2)
COMPUTER SCIENCE (2)
COMPUTERS AND INFORMATION PROCESSING (2)
FIELD PROGRAMMABLE GATE ARRAYS (2)
GPGPU (2)
GRAPHICS PROCESSING UNIT (2)
HIGH PERFORMANCE COMPUTING (2)
MICROPROCESSOR CHIPS (2)
OPENACC (2)
PARALLEL PROGRAMMING MODELS (2)
PIPELINES (2)
PRODUCTIVITY (2)
PROGRAMMING MODEL (2)
RUNTIME SYSTEM (2)
STANDARDS (2)
ACCELERATOR (1)
ACCELERATOR ARCHITECTURES (1)
APPLICATION PARTITIONING (1)
ASYNCHRONOUS-COMMUNICATION (1)
AUTOMATIC CODE SELECTOR (1)
AUTOMATIC DATA DISTRIBUTION (1)
AUTOMATIC PROGRAMMING (1)
C++ ATTRIBUTES (1)
C++ LANGUAGES (1)
CELL BROADBAND ENGINE (1)
CENTRAL PROCESSING UNIT (1)
CLUSTER PROGRAMMING (1)
CMP/MANYCORE (1)
CODE PORTABILITY (1)
CODELETS (1)
COHERENCE (1)
COLLABORATION (1)
COLOR (1)
COMMODITY PERSONAL COMPUTER (1)
COMPONENT (1)
COMPONENTS (1)
COMPUTATIONAL CAPABILITY (1)
COMPUTER LANGUAGES (1)
CONTEXT (1)
COPROCESSORS (1)
CPU-GPU COUPLING (1)
DATA COMMUNICATION (1)
DATA MODELS (1)
DATA MOVEMENT (1)
DATA STRUCTURES (1)
DATA TRANSFER (1)
DATA-FLOW TASK MODEL (1)
DATA-INTENSIVE COMPUTING (1)
DATAFLOW (1)
DEBUGGING (1)
DECODING (1)
DENSE LINEAR ALGEBRA (1)
DEPENDENCY GRAPH (1)
DEVELOPER PRODUCTIVITY (1)
DISCRETE FOURIER TRANSFORMS (1)
DISTRIBUTED TASK SUPERSCALAR PIPELINE (1)
DOUBLE BUFFERING (1)
DSL (1)
DVFS (1)
DYNAMIC PARALLELIZATION TECHNIQUES (1)
ENERGY CONSUMPTION (1)
ENERGY-AWARE COMPUTING (1)
ENGINES (1)
EVENT MANAGEMENT (1)
EXECUTION MODEL (1)
EXTRAE (1)
FFTXLIB (1)
GENERAL-PURPOSE POWER ARCHITECTURE CORE (1)
GPGPU COMPUTING (1)
GPUDIRECT (1)
GRAPH 500 (1)
GRAPH LANGUAGES (1)
HARDWARE-SOFTWARE CODESIGN (1)
HARMONY EXECUTION MODEL (1)
more

INFONA - science communication portal

Search results

A programming model and runtime system for approximation-aware heterogeneous computing

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

OpenMP device offloading to FPGA accelerators

Runtime Coordinated Heterogeneous Tasks in Charm++

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

CID: A Compile-Time Implementation Decider for Heterogeneous Platforms Based on C++ Attributes

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

Open ACC Programs Examined: A Performance Analysis Approach

JolokiaC++: Optimizing Irregular Accesses for GPGPU

Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors

PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations

Power and performance analysis of the Graph 500 benchmark on the Single-chip Cloud Computer

Power and Energy Footprint of OpenMP Programs Using OpenMP Runtime API

Optimizing Collective Communication in UPC

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Automatic execution of single-GPU computations across multiple GPUs

HSAIL: Portable compiler IR for HSA

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options