Search results

Items from 1 to 7 out of 7 results

chapter

A comprehensive performance analysis of HSA and OpenCL 2.0

Saoni Mukherjee, Yifan Sun, Paul Blinzer, Amir Kavyan Ziabari, more

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 183 - 193

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today's platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But recently we are starting to see the introduction...

chapter

Automatic OpenCL Code Generation for Multi-device Heterogeneous Architectures

Pei Li, Elisabeth Brunet, Francois Trahay, Christian Parrot, more

2015 44th International Conference on Parallel Processing > 959 - 968

2015 44th International Conference on Parallel Processing (ICPP)

Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator...

chapter

Programming autonomous behavior of AMM network data concentrator by timed automata

Lukas Krejci

2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 1 > 214 - 219

2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

The paper presents a new approach of AMM network data concentrator autonomous behavior programming. The proposed method utilizes timed automata systems defined by UPPAAL team and extends them with event monitoring and asynchronous actions producing and supervising capabilities. Additionally, a new method of timed automata systems simulation is presented. This method utilizes principles of random order...

chapter

Efficient Barrier Synchronization for OpenMP-Like Parallelism on the Intel SCC

Hayder Al-Khalissi, Rainer Bucty, Mladen Berekovic

2013 International Conference on Parallel and Distributed Systems > 10 - 17

2013 International Conference on Parallel and Distributed Systems (ICPADS)

The continuous increase of the number of processing cores on die poses a new set of challenges to HPC applications programming including how to model, write, and verify software that has to use the full power of NoC-based manycore processors. Therefore, to simplify program development for the Single-chip Cloud Computer (SCC), it is desirable to have high-level, shared memory-based parallel programming...

chapter

An Empirical Performance Study of Chapel Programming Language

Nan Dun, Kenjiro Taura

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 497 - 506

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper we evaluate the performance of the Chapel programming language from the perspective of its language primitives and features, where the micro benchmarks are synthesized from our lessons learned in developing molecular dynamics simulation programs in Chapel. Experimental results show that most language building blocks have comparable performance to corresponding hand-written C code, while...

chapter

Improving Performance of Matrix Multiplication and FFT on GPU

Xiang Cui, Yifeng Chen, Hong Mei

2009 15th International Conference on Parallel and Distributed Systems > 42 - 48

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is achieved on NVIDIA GeForce GTX280 for the former,...

chapter

Phaser accumulators: A new reduction construct for dynamic parallelism

J. Shirako, D.M. Peixotto, V. Sarkar, W.N. Scherer

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 12

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser accumulators, a new reduction construct that meshes seamlessly with phasers to support dynamic parallelism in a phased (iterative) setting. By separating reduction computations into the parts of sending data, performing the...

Filter options

Data set:
ieee
Keywords:
SYNCHRONIZATION
PROGRAMMING
PERFORMANCE EVALUATION

Publication date

Set your own date range

Content availability

Available (6)
None (1)

Keywords

ARRAYS (2)
DATA MINING (2)
KERNEL (2)
PARALLEL PROCESSING (2)
ACCELERATORS (1)
AMD OPTERON (1)
AMM (1)
AUTOMATA (1)
AUTONOMOUS BEHAVIOR (1)
BANDWIDTH (1)
BARRIER SYNCHRONIZATION (1)
CG BENCHMARKS (1)
CLOCKS (1)
CODE GENERATION (1)
COMMUNICATION-INTENSIVE (1)
COMPUTATION REDUCTION (1)
COMPUTATION-INTENSIVE (1)
COMPUTATIONAL MODELING (1)
COMPUTER GRAPHICS (1)
COMPUTER LANGUAGES (1)
COMPUTER SPEED 393 GFLOPS (1)
COPROCESSORS (1)
CUDA (1)
DATA CONCENTRATOR (1)
DATA REDUCTION (1)
DATA STRUCTURES (1)
DATABASES (1)
DELAY (1)
DELAYS (1)
DYNAMIC PARALLELISM (1)
EPCC SYNCBENCH (1)
FAST FOURIER TRANSFORMS (1)
FFT (1)
GPU (1)
GRAPHICS PROCESSING UNITS (1)
HARDWARE (1)
HETEROGENEOUS ARCHITECTURES (1)
INSTRUCTION SETS (1)
INTEL XEON (1)
MANY-CORES (1)
MATRIX MULTIPLICATION (1)
MEMORY BANDWIDTH INTENSIVE (1)
MEMORY MANAGEMENT (1)
MESSAGE SYSTEMS (1)
NVIDIA GEFORCE GTX280 (1)
OBJECT ORIENTED MODELING (1)
OPENCL (1)
OPENMP (1)
OPENMP REDUCTION (1)
PARALLEL LANGUAGES (1)
PHASED ITERATIVE SETTING (1)
PHASER ACCUMULATOR (1)
PROBABILITY DENSITY FUNCTION (1)
RADIATION DETECTORS (1)
REDUCTION CONSTRUCT (1)
REGISTERS (1)
SINGLE-PRECISION FFT (1)
SINGLE-PRECISION MATRIX-MATRIX MULTIPLICATION SUBPROGRAM (1)
SMART METERING (1)
SOFTWARE (1)
SPECTRAL NORM (1)
SPLIT-PHASE BARRIERS (1)
SUN ULTRASPARC T2 MULTICORE SMP (1)
SUPPORT VECTOR MACHINES (1)
SYNCHRONIZATION POINT (1)
SYSTEM-ON-CHIP (1)
TABLE LOOKUP (1)
TIMED AUTOMATA (1)
UPPAAL (1)
WRITING (1)
X10 CODE (1)
X10 PROGRAMMING LANGUAGE (1)
YARN (1)
more

INFONA - science communication portal

Search results

A comprehensive performance analysis of HSA and OpenCL 2.0

Automatic OpenCL Code Generation for Multi-device Heterogeneous Architectures

Programming autonomous behavior of AMM network data concentrator by timed automata

Efficient Barrier Synchronization for OpenMP-Like Parallelism on the Intel SCC

An Empirical Performance Study of Chapel Programming Language

Improving Performance of Matrix Multiplication and FFT on GPU

Phaser accumulators: A new reduction construct for dynamic parallelism

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options