SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

rozdział

PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters

Martin Kong, Louis-Noel Pouchet, P. Sadayappan, Vivek Sarkar

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 456 - 467

Applications running on clusters of shared-memory computers are often implemented using OpenMP+MPI. Productivity can be vastly improved using task-based programming, a paradigm where the user expresses the data and control-flow relations between tasks, offering the runtime maximal freedom to place and schedule tasks. While productivity is increased, high-performance execution remains challenging:...

rozdział

MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines

Thomas Grass, Cesar Allande, Adria Armejach, Alejandro Rico, więcej

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 526 - 537

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

The complexity of High Performance Computing (HPC) systems is increasing in the number of components and their heterogeneity. Interactions between software and hardware involve many different aspects which are typically not transparent to scientific programmers and system architects. Therefore, predicting the behavior of current scientific applications on future HPC infrastructures is a challenging...

rozdział

Understanding Error Propagation in GPGPU Applications

Guanpeng Li, Karthik Pattabiraman, Chen-Yang Cher, Pradip Bose

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 240 - 251

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications have not been investigated in depth. While error propagation has been extensively investigated for non-GPU applications, GPU applications have a very different programming model which can have a significant effect on error propagation...

rozdział

Simulations of Below-Ground Dynamics of Fungi: 1.184 Pflops Attained by Automated Generation and Autotuning of Temporal Blocking Codes

Takayuki Muranushi, Hideyuki Hotta, Junichiro Makino, Seiya Nishizawa, więcej

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 23 - 33

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Stencil computation has many applications in science and engineering, thus many optimization techniques such as temporal blocking have been developed. They are, however, rarely used in real-world applications, since a large amount of careful programming is required for even the simplest of stencils. We introduce Formura, a domain specific language that provides easy access to optimized stencil computations...

rozdział

Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs

Hamid Reza Zohouri, Naoya Maruyamay, Aaron Smith, Motohiko Matsuda, więcej

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 409 - 420

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

We evaluate the power and performance of the Rodinia benchmark suite using the Altera SDK for OpenCL targeting a Stratix V FPGA against a modern CPU and GPU. We study multiple OpenCL kernels per benchmark, ranging from direct ports of the original GPU implementations to loop-pipelined kernels specifically optimized for FPGAs. Based on our results, we find that even though OpenCL is functionally portable...

rozdział

Elastic Multi-resource Fairness: Balancing Fairness and Efficiency in Coupled CPU-GPU Architectures

Shanjiang Tang, BingSheng He, Shuhao Zhang, Zhaojie Niu

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 875 - 886

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Fairness and efficiency are two important concerns for users in a shared computer system, and there tends to be a tradeoff between them. Heterogeneous computing poses new challenging issues on the fair allocation of computational resources among users due to the availability of different kinds of computing devices (e.g., CPU and GPU). Prior work either considers the fair resource allocation separately...

rozdział

Perilla: Metadata-Based Optimizations of an Asynchronous Runtime for Adaptive Mesh Refinement

Tan Nguyen, Didem Unat, Weiqun Zhang, Ann Almgren, więcej

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 945 - 956

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Hardware architecture is increasingly complex, urging the development of asynchronous runtime systems with advance resource and locality management supports. However, these supports may come at the cost of complicating the user interface while programming remains one of the major constraints to wide adoption of asynchronous runtimes in practice. In this paper, we propose a solution that leverages...

rozdział

Daino: A High-Level Framework for Parallel and Efficient AMR on GPUs

Mohamed Wahib, Naoya Maruyama, Takayuki Aoki

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 621 - 632

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Adaptive Mesh Refinement methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated...

rozdział

Translating OpenMP Device Constructs to OpenCL Using Unnecessary Data Transfer Elimination

Junghyun Kim, Yong-Jun Lee, Jungho Park, Jaejin Lee

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 597 - 608

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

In this paper, we propose a framework that translates OpenMP 4.0 accelerator directives to OpenCL. By translating an OpenMP program to an OpenCL program, the program can be executed on any hardware platform that supports OpenCL. We also propose a run-time optimization technique that automatically eliminates unnecessary data transfers between the host and the target accelerator. It exploits the page-fault...

rozdział

dCUDA: Hardware Supported Overlap of Computation and Communication

Tobias Gysi, Jeremia Bar, Torsten Hoefler

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 609 - 620

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side...

INFONA - portal komunikacji naukowej

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters

MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines

Understanding Error Propagation in GPGPU Applications

Simulations of Below-Ground Dynamics of Fungi: 1.184 Pflops Attained by Automated Generation and Autotuning of Temporal Blocking Codes

Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs

Elastic Multi-resource Fairness: Balancing Fairness and Efficiency in Coupled CPU-GPU Architectures

Perilla: Metadata-Based Optimizations of an Asynchronous Runtime for Adaptive Mesh Refinement

Daino: A High-Level Framework for Parallel and Efficient AMR on GPUs

Translating OpenMP Device Constructs to OpenCL Using Unnecessary Data Transfer Elimination

dCUDA: Hardware Supported Overlap of Computation and Communication

Opcje filtrowania

Data publikacji

Słowa kluczowe

INFONA - portal komunikacji naukowej

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis $("#expandableTitles").expandable();

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis