Search results

Items from 1 to 20 out of 20 results

chapter

Experimentation of vision algorithm performance using custom OpenCL™ vector language extensions for a graphical accelerator with vector architecture

Bogdan Ditu, Fred Peterson, Ciprian Arbone

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP) > 339 - 346

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)

OpenCL is a standard that supports a parallel programming paradigm which enables heterogeneous multi-core systems and also offers a high level of portability for the application. Some of the systems that are used with OpenCL might have vector capabilities at device compute units level. There are more ways the vector capabilities could be exploited by the OpenCL device application, the most common...

chapter

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Javier Alejandro Varela, Norbert Wehn, Qian Liang, Songyin Tang

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 124 - 131

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the field of high performance heterogeneous computing systems, field programmable gate arrays (FPGAs) have shown great advantages in terms of acceleration and energy efficiency. And with the inclusion of the OpenCL framework for parallel programming, the design complexity has been greatly reduced. However, the parallel implementation of applications containing data-dependent branches usually experiences...

chapter

Translating OpenACC to LLVM IR with SPIR kernels

Hao-Wei Peng, Jean Jyh-Jiun Shann

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) > 1 - 6

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)

In general, highly parallelized programs executed on heterogeneous multiprocessor platforms may get better performance than homogeneous ones. OpenCL is one of the standards for parallel programming of heterogeneous multiprocessor platforms and SPIR (Standard Portable Intermediate Representation) is a portable binary format for representing OpenCL kernel code. However, the programming of these programs...

chapter

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Ashwin Mandayam Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng

2015 IEEE International Conference on Cluster Computing > 42 - 51

2015 IEEE International Conference on Cluster Computing (CLUSTER)

OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue," to a specific device for the entire program. For best performance, the user has to find the ideal queue -- device mapping at command queue creation time, an effort that requires a thorough understanding...

chapter

A generic infrastructure for OpenCL performance analysis

Robert Dietrich, Ronny Tschuter

2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 1 > 334 - 341

2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

OpenCL is an open standard for programming of parallel heterogeneous systems. It is designed for portability, therefore being utilized in the area of embedded system programming as well as high performance computing (HPC). Due to the applicability on different platforms, OpenCL library vendors have a certain freedom in implementing parts of the OpenCL execution model. Multiple versions of the standard...

chapter

Open ACC Programs Examined: A Performance Analysis Approach

Robert Dietrich, Guido Juckeland, Michael Wolfe

2015 44th International Conference on Parallel Processing > 310 - 319

2015 44th International Conference on Parallel Processing (ICPP)

The Open ACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or Open CL. Details are implicit in the programming model and managed by Open ACC API-enabled compilers and...

chapter

Extensions over OpenCL for Latency Reduction and Critical Applications

Grigore Lupescu, Emil-Ioan Slusanschi, Nicolae Tapus

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 379 - 385

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

Hardware and software stack complexity make programming GPGPUs difficult and limit application portability. This article first discusses challenges imposed by the current hardware and software model in GPGPU systems which relies heavily on the HOST device (CPU). We then identify system bottlenecks both in the hardware design and in the software stack and present two ideas to extend the HOST and DEVICE...

chapter

Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors

Deepak Majeti, Vivek Sarkar

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 708 - 717

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

Heterogeneous architectures with their diverse architectural features impose significant programmability challenges. Existing programming systems involve non-trivial learning and are not productive, not portable, and are challenging to tune for performance. In this paper, we introduce Heterogeneous Habanero-C (H2C), which is an implementation of the Habanero execution model for modern heterogeneous...

chapter

Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL

Guido Juckeland, Alexander Grund, Wolfgang E. Nagel

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 689 - 698

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performance and energy consumption. It also provides observations...

chapter

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

Yuan Wen, Zheng Wang, Michael F. P. O'Boyle

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need...

chapter

An OpenCL runtime system for a heterogeneous many-core virtual platform

Kuan-Chung Chen, Chung-Ho Chen

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 2197 - 2200

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

We present a many-core full system simulation platform and its OpenCL runtime system. The OpenCL runtime system includes an on-the-fly compiler and resource manager for the ARM-based many-core platform. Using this platform, we evaluate approaches of work-item scheduling and memory management in OpenCL memory hierarchy. Our experimental results show that scheduling work-items on a many-core system...

chapter

Extending OpenSHMEM for GPU Computing

S. Potluri, D. Bureddy, H. Wang, H. Subramoni, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1001 - 1012

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graphics Processing Units (GPUs) are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. In order to maximize utilization, it is imperative that applications running on these clusters have low synchronization and communication overheads. Partitioned Global Address Space (PGAS) models provide an attractive approach for developing...

chapter

For Three Easy Payments: Scoring Peptides with Portable Performance Using Specmaster

Rick Weber, Gregory D. Peterson, Robert Hettich

2012 Symposium on Application Accelerators in High Performance Computing > 102 - 110

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

In shotgun proteomics, matching peptides to tandem mass spectrometry data is a computationally expensive process that in some cases can take days using conventional software packages. Even though many existing search engines such as Sequest, Myrimatch, and X!Tandem now exploit multiple processors via threading libraries, they leave much on the table in terms of performance and don't exploit computational...

chapter

Directive-based Programming for GPUs: A Comparative Study

Ruym'n Reyes, Ivan Lopez, Juan J. Fumero, Francisco de Sande

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 410 - 417

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...

chapter

GPU-based Cloud computing for comparing the structure of protein binding sites

Matthias Leinweber, Lars Baumgartner, Marco Mernberger, Thomas Fober, more

2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST) > 1 - 6

2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST 2012) - Complex Environment Engineering

In this paper, we present a novel approach for using a GPU-based Cloud computing infrastructure to efficiently perform a structural comparison of protein binding sites. The original CPU-based Java version of a recent graph-based algorithm called SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been...

chapter

Communication Library to Overlap Computation and Communication for OpenCL Application

Toshiya Komoda, Shinobu Miwa, Hiroshi Nakamura

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 567 - 573

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

User-friendly parallel programming environments, such as CUDA and OpenCL are widely used for accelerators. They provide programmers with useful APIs, but the APIs are still low level primitives. Therefore, in order to apply communication optimization techniques, such as double buffering techniques, programmers have to manually write the programs with the primitives. Manual communication optimization...

chapter

An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence

Jun Lee, Jungwon Kim, Junghyun Kim, Sangmin Seo, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 56 - 67

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Recently, Intel has introduced a research prototype many core processor called the Single-chip Cloud Computer (SCC). The SCC is an experimental processor created by Intel Labs. It contains 48 cores in a single chip and each core has its own L1 and L2 caches without any hardware support for cache coherence. It allows maximum 64GB size of external memory that can be accessed by all cores and each core...

chapter

A History-Based Performance Prediction Model with Profile Data Classification for Automatic Task Allocation in Heterogeneous Computing Systems

Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, Hiroaki Kobayashi

2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications > 135 - 142

2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA)

In this paper, we propose a runtime performance prediction model for automatic selection of accelerators to execute kernels in OpenCL. The proposed method is a history-based approach that uses profile data for performance prediction. The profile data are classified into some groups, from each of which its own performance model is derived. As the execution time of a kernel depends on some runtime parameters...

chapter

Hybrid OpenCL: Connecting Different OpenCL Implementations over Network

R Aoki, S Oikawa, R Tsuchiyama, T Nakamura

2010 10th IEEE International Conference on Computer and Information Technology > 2729 - 2735

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

We are developing Hybrid OpenCL, which enables the connection between different OpenCL implementations over the network. Hybrid OpenCL consists of two elements, a runtime system that provides the abstraction of different OpenCL implementations and a bridge program that connects multiple OpenCL runtime systems over the network. Hybrid OpenCL enables the construction of the scalable OpenCL environments...

chapter

Speculative execution on multi-GPU systems

Gregory Diamos, Sudhakar Yalamanchili

2010 IEEE International Symposium on Parallel&Distributed Processing (IPDPS) > 1 - 12

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order...

Filter options

Data set:
ieee
Keywords:
KERNEL
RUNTIME
OPENCL

Publication date

Set your own date range

Keywords

GRAPHICS PROCESSING UNITS (6)
PROGRAMMING (6)
COMPUTATIONAL MODELING (5)
PARALLEL PROGRAMMING (5)
PERFORMANCE EVALUATION (5)
CUDA (4)
GPGPU (4)
HARDWARE (4)
INSTRUCTION SETS (4)
STANDARDS (4)
COMPUTER ARCHITECTURE (3)
CONTEXT (3)
GPU (3)
GRAPHICS PROCESSING UNIT (3)
LIBRARIES (3)
MEMORY MANAGEMENT (3)
OPENACC (3)
ACCELERATOR (2)
ACCELERATORS (2)
MULTIPROCESSING SYSTEMS (2)
OPTIMIZATION (2)
PARALLEL PROCESSING (2)
PERFORMANCE ANALYSIS (2)
PROGRAM PROCESSORS (2)
PROTEINS (2)
RUNTIME SYSTEM (2)
SCHEDULES (2)
ACCURACY (1)
AMD GPU (1)
AMINO ACIDS (1)
API (1)
APPLICATION PARTITIONING (1)
ARRAYS (1)
AUTOMATIC DATA DISTRIBUTION (1)
BENCHMARK TESTING (1)
BRIDGES (1)
CACHE COHERENCE (1)
CAVITY RESONATORS (1)
CLANG (1)
CLOUD COMPUTING (1)
COHERENCE (1)
COLOR (1)
COMPILER (1)
COMPILERS (1)
COMPONENTS (1)
COMPUTATIONAL CAPABILITY (1)
COPPER (1)
COPROCESSORS (1)
CORRELATION (1)
DATA PREPROCESSING (1)
DATA PROCESSING TIME (1)
DATA-DEPENDENT BRANCHES (1)
DATABASES (1)
DECOUPLED WORK-ITEMS (1)
DOUBLE BUFFERING (1)
DYNAMIC PARALLELIZATION TECHNIQUES (1)
DYNAMIC SCHEDULING (1)
ELECTRONICS PACKAGING (1)
EVENT MANAGEMENT (1)
FEATURE EXTRACTION (1)
FIELD PROGRAMMABLE GATE ARRAYS (1)
FPGA (1)
FULL SYSTEM SIMULATION (1)
GRAPH ALIGNMENT (1)
HARMONY EXECUTION MODEL (1)
HARMONY RUNTIME (1)
HETEROGENEOUS (1)
HETEROGENEOUS ARCHITECTURES (1)
HETEROGENEOUS COMPUTING (1)
HETEROGENEOUS INTEGRATION (1)
HETEROGENEOUS MANY-CORE PROCESSORS (1)
HETEROGENEOUS MULTIPROCESSOR PLATFORMS (1)
HETEROGENEOUS SYSTEM (1)
HISTORY-BASED (1)
HOST (1)
HPC (1)
HYBRID OPENCL (1)
IMAGE COLOR ANALYSIS (1)
INDEXES (1)
INTEL XEON PHI (1)
ISA (1)
KERNEL LEVEL SPECULATION (1)
LATENCY (1)
LINEAR ALGEBRA (1)
LLVM (1)
MACHINE LEARNING (1)
MAGNETIC CORES (1)
MASS SPECTROMETRY (1)
MEMORY CONSISTENCY (1)
MESSAGE SYSTEMS (1)
MICRO-ARCHITECTURE (1)
MIDDLEWARE (1)
MULTI-GPU SYSTEMS (1)
MYRI-MATCH (1)
NVIDIA GPU (1)
OPEN ACC (1)
OPENCL IMPLEMENTATIONS (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options