Search results

Items from 1 to 20 out of 56 results

chapter

Aggressive pipelining of irregular applications on reconfigurable hardware

Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 575 - 586

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

CPU-FPGA heterogeneous platforms offer a promising solution for high-performance and energy-efficient computing systems by providing specialized accelerators with post-silicon reconfigurability. To unleash the power of FPGA, however, the programmability gap has to be filled so that applications specified in high-level programming languages can be efficiently mapped and scheduled on FPGA. The above...

chapter

Introducing parallel computing concepts in computer system related courses

Han Wan, Xiaopeng Gao, Xiang Long, Bo Jiang

2017 IEEE Frontiers in Education Conference (FIE) > 1 - 7

2017 IEEE Frontiers in Education Conference (FIE)

All semiconductor market domains are converging to concurrent platforms. This trend has certainly led real challenge to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals. This paper argues that the Computer System related courses are natural places to introduce the parallelism, and the earlier to parallel computing concepts...

chapter

Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning

Konstantinos Krommydas, Ruchira Sasanka, Wu-chun Feng

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 213 - 218

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Programming FPGAs has been an arduous task that requires extensive knowledge of hardware design languages (HDLs), such as Verilog or VHDL, and low-level hardware details. With OpenCL support for FPGAs, the design, prototyping and implementation of an FPGA is increasingly moving towards a much higher level of abstraction, when compared to the intrinsically low-level nature of HDLs. On the other hand,...

chapter

Employing Compression Solutions under OpenACC

Ebad Salehi, Ahmad Lashgar, Amirali Baniasadi

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 348 - 356

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

For GPUs to achieve their peak performance, effective and efficient usage of memory bandwidth is necessary. To this end, programmers invest extensive development effort to optimize a GPU program, specially its memory bandwidth usage. The OpenACC programming model has been introduced to tackle the accelerators programming complexity. However, this model's coarse-grained control on a program can make...

chapter

A Component Based Graphical Parallel Programming Approach for Numerical Simulation Development

Liao Li, Mo Zeyao, Zhang Aiqing

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS) > 298 - 303

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS)

Building massively parallel numerical simulations is not easy due to lasting changes of parallel programming models and various software technologies needed. We develop a component based graphical parallel programming approach to lower the difficulties of coding applications in scientific and engineering computing and support rapid development of large scale simulations basing on a domain specific...

chapter

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Nicolas Benoit, Stephane Louise

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 811 - 819

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...

chapter

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Nicolas Benoit, Stephane Louise

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 811 - 819

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

chapter

Using type transformations to generate program variants for FPGA design space exploration

Syed Waqar Nabi, Wim Vanderbauwhede

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby- construction program variants through type transformations...

chapter

CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters

Mohammed Sourouri, Johannes Langguth, Filippo Spiga, Scott B. Baden, more

2015 IEEE 18th International Conference on Computational Science and Engineering > 17 - 26

2015 IEEE 18th International Conference on Computational Science and Engineering (CSE)

On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU...

chapter

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Hyoukjoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Tiark Rompf, more

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 63 - 74

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which...

chapter

OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study

Seyong Lee, Jeffrey S. Vetter

2014 First Workshop on Accelerator Programming using Directives > 1 - 11

2014 First Workshop on Accelerator Programming using Directives (WACCPD)

Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research...

chapter

POSTER: Utilizing dataflow-based execution for coupled cluster methods

Heike McCraw, Anthony Danalis, Thomas Herault, George Bosilca, more

2014 IEEE International Conference on Cluster Computing (CLUSTER) > 296 - 297

2014 IEEE International Conference On Cluster Computing (CLUSTER)

Computational chemistry comprises one of the driving forces of High Performance Computing. In particular, many-body methods, such as Coupled Cluster methods (CC) [1] of the quantum chemistry package NWCHEM [2], are of particular interest for the applied chemistry community.

chapter

SignalPU: A Programming Model for DSP Applications on Parallel and Heterogeneous Clusters

Farouk Mansouri, Sylvain Huet, Dominique Houzet

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 937 - 944

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

The biomedical imagery, the numeric communications, the acoustic signal processing and many others gls[dsp] applications are present more and more in the numeric world. They process growing data volume which is represented with more and more accuracy, and use complex algorithms with time constraints to satisfying. Consequently, a high requirement of computing power characterize them. To satisfy this...

chapter

Computing Prestack Kirchhoff time migration algorithm on OpenCL and GPGPU

Zhanlin Yu, Xiaohua Shi

2014 International Conference on Mechatronics and Control (ICMC) > 1416 - 1419

2014 International Conference on Mechatronics and Control (ICMC)

In seismic data processing, Prestack Kirchhoff time migration is a forming method, and widely used. In this paper, we introduced how to port the original CUDA program to OpenCL, and how to implement and optimize Prestack Kirchhoff Time Migration algorithm on OpenCL and General Purpose GPU, and how to optimize the OpenCL program to get the competitive performance comparing with the original CUDA version...

chapter

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Dmitry Mikushin, Nikolay Likhogrud, Eddy Z. Zhang, Christopher Bergstrom

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1011 - 1020

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained...

chapter

Accurate and Efficient Three Level Design Space Exploration Based on Constraints Satisfaction Optimization Problem Solver

Shuo Li, Ahmed Hemani

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines > 174

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

In this paper, we propose an efficient and effective there level Design Space Exploration (DSE) method for mapping a system consisting of a number of DSP functions onto an RTL or lower level model using constraint programming methodology. The design space has three dimensions: a) function execution schedule (when the functions should execute), b) function implementation assignment (how the execution...

chapter

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

Konstantinos Krommydas, Muhsen Owaida, Christos D. Antonopoulos, Nikolaos Bellas, more

2013 International Conference on Parallel and Distributed Systems > 432 - 433

2013 International Conference on Parallel and Distributed Systems (ICPADS)

The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and industry-supported programming model that offers code portability on heterogeneous platforms, allowing applications to be developed once and deployed...

chapter

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud

Jianlong Zhong, Bingsheng He

2013 IEEE 5th International Conference on Cloud Computing Technology and Science > 1 > 9 - 16

2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom)

Recently, we have witnessed that cloud providers start to offer heterogeneous computing environments. There have been wide interests in both clusters and cloud of adopting graphics processors (GPUs) as accelerators for various applications. On the other hand, large-scale graph processing is important for many data-intensive applications in the cloud. In this paper, we propose to leverage GPUs to accelerate...

chapter

A 2D Gaussian smoothing kernel mapped to heterogeneous platforms

A. Trabelsi, Y. Savaria

2013 IEEE 11th International New Circuits and Systems Conference (NEWCAS) > 1 - 4

2013 IEEE 11th International New Circuits and Systems Conference (NEWCAS)

In this paper, we present a comparative performance study of a 2D Gaussian blur kernel mapped to a heterogeneous multi-core CPU/GPU platform. In this study, the kernel workgroup, the Gaussian kernel and the image sizes are considered variable parameters. We aim to gain insight into how well the execution and data movement times evolve across each computing device in varying the values of these parameters...

chapter

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

Thierry Gautier, Joao V.F. Lima, Nicolas Maillard, Bruno Raffin

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1299 - 1308

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes, scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports a data-flow task...

Keywords:
KERNEL
PARALLEL PROCESSING
PROGRAMMING

Publication date

Set your own date range

Content availability

Available (55)
None (1)

Keywords

GRAPHICS PROCESSING UNITS (19)
GRAPHICS PROCESSING UNIT (14)
COMPUTATIONAL MODELING (12)
HARDWARE (12)
COMPUTER ARCHITECTURE (11)
GPU (11)
INSTRUCTION SETS (11)
COMPUTER GRAPHICS (9)
OPTIMIZATION (9)
COPROCESSORS (8)
CUDA (8)
FIELD PROGRAMMABLE GATE ARRAYS (7)
OPENCL (7)
YARN (7)
COMPUTER GRAPHIC EQUIPMENT (6)
OPENMP (6)
PERFORMANCE EVALUATION (6)
BANDWIDTH (5)
COMPUTE UNIFIED DEVICE ARCHITECTURE (5)
MESSAGE PASSING (5)
PARALLEL PROGRAMMING (5)
PROGRAM PROCESSORS (5)
ARRAYS (4)
CENTRAL PROCESSING UNIT (4)
GRAPHIC PROCESSING UNIT (4)
INDEXES (4)
MAGNETIC CORES (4)
MESSAGE SYSTEMS (4)
MICROPROCESSOR CHIPS (4)
RUNTIME (4)
ALGORITHM DESIGN AND ANALYSIS (3)
APPLICATION PROGRAM INTERFACES (3)
BENCHMARK TESTING (3)
DATABASES (3)
GPGPU (3)
HETEROGENEOUS SYSTEMS (3)
MEMORY MANAGEMENT (3)
MESSAGE PASSING INTERFACE (3)
MULTI-THREADING (3)
MULTIPROCESSING SYSTEMS (3)
NVIDIA GPU (3)
PARALLEL ARCHITECTURES (3)
PRODUCTIVITY (3)
RANDOM ACCESS MEMORY (3)
SCHEDULES (3)
SPACE EXPLORATION (3)
BIOINFORMATICS (2)
COMPILATION (2)
COMPUTATIONAL EFFICIENCY (2)
CPU (2)
DATA TRANSFER (2)
DISTRIBUTED MEMORY SYSTEMS (2)
FIELD PROGRAMMABLE GATE ARRAY (2)
FORMAL SPECIFICATION (2)
FPGA (2)
HARDWARE DESIGN LANGUAGES (2)
INTERMEDIATE REPRESENTATION (2)
KERNEL MAPPING (2)
LIBRARIES (2)
LINEAR PROGRAMMING (2)
MANY-CORE SYSTEMS (2)
MPI (2)
MULTI-CORE ARCHITECTURES (2)
OPENACC (2)
PARALLEL COMPUTING (2)
PARALLEL PROGRAMMING MODELS (2)
PERFORMANCE TUNING (2)
REGISTERS (2)
SEQUENCE ALIGNMENT (2)
STREAM PROCESSOR (2)
STREAM PROGRAMMING MODEL (2)
SYNCHRONIZATION (2)
TASK PARALLELISM (2)
3D GRAPHICS (1)
3DES (1)
ACCELERATE (1)
ACCELERATION (1)
ACCELERATOR (1)
ACCELERATOR-BASED HETEROGENEOUS PARALLEL SYSTEMS (1)
ACCELERATORS (1)
ACCURACY (1)
ADAPTIVE WEIGHT (1)
AES (1)
ALGORITHMS (1)
ALTERA STRATIX II (1)
ALTIX SYSTEM (1)
AMBRIC (1)
AMBRIC DEVELOPMENT SYSTEM (1)
AMBRIC MASSIVELY PARALLEL PROCESSOR ARRAY (1)
AMD HD4850 (1)
APPLICATION PARTITIONING (1)
APPLICATION PROGRAM INTERFACE (1)
APPLICATION-SPECIFIC PROCESSORS (1)
APPLICATIONS RECODING (1)
APU (1)
ARCHITECTURE AND ORGANIZATION COURSE (1)
ARRAY INTENSIVE SPEC2K BENCHMARK (1)
more

INFONA - science communication portal

Search results

Aggressive pipelining of irregular applications on reconfigurable hardware

Introducing parallel computing concepts in computer system related courses

Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning

Employing Compression Solutions under OpenACC

A Component Based Graphical Parallel Programming Approach for Numerical Simulation Development

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Using type transformations to generate program variants for FPGA design space exploration

CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study

POSTER: Utilizing dataflow-based execution for coupled cluster methods

SignalPU: A Programming Model for DSP Applications on Parallel and Heterogeneous Clusters

Computing Prestack Kirchhoff time migration algorithm on OpenCL and GPGPU

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Accurate and Efficient Three Level Design Space Exploration Based on Constraints Satisfaction Optimization Problem Solver

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud

A 2D Gaussian smoothing kernel mapped to heterogeneous platforms

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options