Search results

Items from 41 to 60 out of 372 results

chapter

A Comparative Study of SYCL, OpenCL, and OpenMP

Hercules Cardoso Da Silva, Flavia Pisani, Edson Borin

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 61 - 66

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Recent trends indicate that future computing systems will be composed by a group of heterogeneous computing devices, including CPUs, GPUs, and other hardware accelerators. These devices provide increased processing performance, however, creating efficient code for them may require that programmers manage memory assignments and use specialized APIs, compilers, or runtime systems, thus making their...

chapter

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, more

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 10

2016 IEEE International Symposium on Workload Characterization (IISWC)

Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale data parallel workloads, but are considered weak in processing serialized tasks and communicating with other devices. Pursuing a CPU-GPU collaborative computing model which takes advantage of both devices could provide an important breakthrough in realizing the full performance potential of heterogeneous computing...

chapter

DT-CGRA: Dual-track coarse-grained reconfigurable architecture for stream applications

Xitian Fan, Huimin Li, Wei Cao, Lingli Wang

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 9

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

This paper presents a new type of coarse-grained reconfigurable architecture (CGRA) for the object inference domain in machine learning. The proposed CGRA is optimized for stream processing and a correspondent programming model called dual-track model is proposed. The CGRA is realized in Verilog HDL and implemented in SMIC 55 nm process, with the footprint of 3.79 mm² and consuming 1.79 W at 500 MHz...

chapter

Fast kernel fuzzy c-means algorithms based on difference of convex programming

Li Chen, Shuisheng Zhou, Xintao Gao

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) > 1090 - 1095

2016 12th International Conference on Natural Computation and 13th Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

In this study, we propose three new algorithms based on difference of convex (DC) programming and DC algorithm (DCA) for kernel fuzzy c-means (KFCM) clustering model. Firstly, KFCM model is reformulated into two equivalent forms of DC programmings for which different KFCM algorithms are designed. Then, to further accelerate the second DCA based KFCM algorithm, we adopt an approximate strategy which...

chapter

GraVF: A vertex-centric distributed graph processing framework on FPGAs

Nina Engelhardt, Hayden Kwok-Hay So

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are promising platforms to efficiently execute distributed graph algorithms. Unfortunately, they are notoriously hard to program, especially when the problem size and system complexity increases. In this paper, we propose GraVF, a high-level design framework for distributed graph processing on FPGAs. It leverages the vertex-centric paradigm, which is naturally distributed and requires the user...

chapter

FPGA-based accelerator design from a domain-specific language

M. Akif Ozkan, Oliver Reiche, Frank Hannig, Jurgen Teich

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 9

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

A large portion of image processing applications often come with stringent requirements regarding performance, energy efficiency, and power. FPGAs have proven to be among the most suitable architectures for algorithms that can be processed in a streaming pipeline. Yet, designing imaging systems for FPGAs remains a very time consuming task. High-Level Synthesis, which has significantly improved due...

chapter

Taming Big Data Scheduling with Locality-Aware Scheduling

Mao Ye, Jun Wang, Jiangling Yin, Dezhi Han

2016 International Conference on Advanced Cloud and Big Data (CBD) > 37 - 44

2016 International Conference on Advanced Cloud and Big Data (CBD)

Incorporating MPI programming model into data-intensive file system for big data application is significant in performance research for optimization purpose. In this paper we ported an MPI-SVM solver, originally developed for HPC environment to the Hadoop distributed file system (HDFS). We analyzed the performance bottlenecks with which the SVM solver will be confronted on the HDFS. It is known the...

chapter

Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning

Konstantinos Krommydas, Ruchira Sasanka, Wu-chun Feng

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 213 - 218

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Programming FPGAs has been an arduous task that requires extensive knowledge of hardware design languages (HDLs), such as Verilog or VHDL, and low-level hardware details. With OpenCL support for FPGAs, the design, prototyping and implementation of an FPGA is increasingly moving towards a much higher level of abstraction, when compared to the intrinsically low-level nature of HDLs. On the other hand,...

chapter

An OpenCL-based framework for rapid virtual prototyping of heterogeneous architectures

Efstathios Sotiriou-Xanthopoulos, Leonard Masing, Kostas Siozios, George Economakos, more

2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) > 372 - 377

2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)

The increasing performance and power requirements in embedded systems has lead to a variety of heterogeneous hardware architectures, featuring many different types of processing elements. This heterogeneity however induces extra effort on system development and programming. To address this heterogeneity, OpenCL provides a portable programming model which enables the use of one source code in various...

chapter

CID: A Compile-Time Implementation Decider for Heterogeneous Platforms Based on C++ Attributes

Luis Miguel Sanchez, David del Rio Astorga, Manuel F. Dolz, Javier Fernandez

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) > 1149 - 1156

With the emergence of heterogeneous architectures, the development of parallel software has become an increasingly complex issue. The fact of using multiple programming models targeted to specific devices has turned the implementation process into a challenging task that comes along with a variety of difficulties. In this sense, developers are preoccupied with finding ways to alleviate the burden...

chapter

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

Chih-Chen Kao, Yu-Tsung Miao, Wei-Chung Hsu

2016 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2016 IEEE International Conference on Multimedia and Expo (ICME)

The prevalence of real time multimedia delivery appliances has led to the developments of a variety of efficient architectures and supporting software technologies. Especially, Ray-Tracing, a well-known physically-based rendering algorithm, has been receiving great attention in research and development. Unfortunately, Ray-Tracing algorithm, being one of the irregular applications, suffers from the...

chapter

Employing Compression Solutions under OpenACC

Ebad Salehi, Ahmad Lashgar, Amirali Baniasadi

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 348 - 356

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

For GPUs to achieve their peak performance, effective and efficient usage of memory bandwidth is necessary. To this end, programmers invest extensive development effort to optimize a GPU program, specially its memory bandwidth usage. The OpenACC programming model has been introduced to tackle the accelerators programming complexity. However, this model's coarse-grained control on a program can make...

chapter

Vertex-Centric Graph Processing on FPGA

Nina Engelhardt, Hayden Kwok-Hay So

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 92

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Past research and implementation efforts have shown that FPGAs are efficient at processing many graph algorithms. However, they are notoriously hard to program, leading to impractically long development times even for simple applications. We propose a vertex-centric framework for graph processing on FPGAs, providing a base execution model and distributed architecture so that developers need only write...

chapter

Bridging the Performance-Programmability Gap for FPGAs via OpenCL: A Case Study with OpenDwarfs

Konstantinos Krommydas, Ahmed E. Helal, Anshuman Verma, Wu-Chun Feng

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 198

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance has come at the significant expense of programmability, i.e., the performance-programmability gap. In particular,...

chapter

Research Report: Analysis of Software for Restricted Computational Environment Applicability

Jacob I. Torrey, Jonathan Miodownik

2016 IEEE Security and Privacy Workshops (SPW) > 185 - 188

2016 IEEE Security and Privacy Workshops (SPW)

Preliminary experiment design and research goals are presented to measure the applicability of restricted computational complexity environments in general purpose development efforts. The Linux kernel is examined through the lens of LangSec in order to gain insight into the make-up of the kernel code vis-à-vis the complexity class of recognizer for input to each component on the Chomsky Hierarchy...

chapter

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Tomasz Topa, Artur Noga, Andrzej Karwowski

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON) > 1 - 4

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON)

Numerical approach to frequency response problems usually requires that the system governing equation is solved repeatedly at many frequencies. The computational efficiency of the overall process can be increased by departing from traditional sequential computing model in favor of utilizing the parallel processing capability commonly offered by modern hardware. In this paper, we consider a hybrid...

chapter

Composable, parameterizable templates for high-level synthesis

Janarbek Matai, Dajung Lee, Alric Althoff, Ryan Kastner

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 744 - 749

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

High-level synthesis tools aim to make FPGA programming easier by raising the level of programming abstraction. Yet in order to get an efficient hardware design from HLS tools, the designer must know how to write HLS code that results in an efficient low level hardware architecture. Unfortunately, this requires substantial hardware knowledge, which limits wide adoption of HLS tools outside of hardware...

chapter

Performance-centric scheduling with task migration for a heterogeneous compute node in the data center

Achim Losch, Tobias Beisel, Tobias Kenter, Christian Plessl, more

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 912 - 917

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The use of heterogeneous computing resources, such as Graphic Processing Units or other specialized coprocessors, has become widespread in recent years because of their performance and energy efficiency advantages. Approaches for managing and scheduling tasks to heterogeneous resources are still subject to research. Although queuing systems have recently been extended to support accelerator resources,...

chapter

A Component Based Graphical Parallel Programming Approach for Numerical Simulation Development

Liao Li, Mo Zeyao, Zhang Aiqing

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS) > 298 - 303

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS)

Building massively parallel numerical simulations is not easy due to lasting changes of parallel programming models and various software technologies needed. We develop a component based graphical parallel programming approach to lower the difficulties of coding applications in scientific and engineering computing and support rapid development of large scale simulations basing on a domain specific...

chapter

A comprehensive performance analysis of HSA and OpenCL 2.0

Saoni Mukherjee, Yifan Sun, Paul Blinzer, Amir Kavyan Ziabari, more

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 183 - 193

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today's platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But recently we are starting to see the introduction...

Keywords:
KERNEL
PROGRAMMING

Publication date

Set your own date range

Content availability

Available (369)
None (3)

Keywords

GRAPHICS PROCESSING UNITS (104)
HARDWARE (82)
COMPUTER ARCHITECTURE (81)
COMPUTATIONAL MODELING (73)
INSTRUCTION SETS (62)
OPTIMIZATION (57)
PARALLEL PROCESSING (56)
GRAPHICS PROCESSING UNIT (55)
GPU (51)
CUDA (43)
OPENCL (39)
COMPUTER GRAPHIC EQUIPMENT (36)
COPROCESSORS (36)
PROGRAM PROCESSORS (36)
RUNTIME (35)
FIELD PROGRAMMABLE GATE ARRAYS (34)
ARRAYS (30)
LIBRARIES (30)
PERFORMANCE EVALUATION (30)
BENCHMARK TESTING (29)
REGISTERS (29)
SYNCHRONIZATION (26)
PARALLEL PROGRAMMING (25)
ALGORITHM DESIGN AND ANALYSIS (24)
LINUX (24)
MEMORY MANAGEMENT (23)
GPGPU (22)
DATA MINING (19)
OPENMP (18)
YARN (18)
BANDWIDTH (16)
COMPUTER GRAPHICS (16)
SUPPORT VECTOR MACHINES (16)
APPLICATION PROGRAM INTERFACES (15)
HIGH PERFORMANCE COMPUTING (15)
MULTIPROCESSING SYSTEMS (15)
MICROPROCESSOR CHIPS (14)
ACCELERATION (13)
CONTEXT (13)
FPGA (13)
GRAPHICS (13)
MPI (13)
PARALLEL ARCHITECTURES (13)
RANDOM ACCESS MEMORY (13)
JAVA (12)
STANDARDS (12)
COMPLEXITY THEORY (11)
COMPUTE UNIFIED DEVICE ARCHITECTURE (11)
DATA TRANSFER (11)
INDEXES (11)
MULTI-THREADING (11)
SERVERS (11)
MESSAGE PASSING (10)
OPERATING SYSTEMS (10)
PROGRAMMING MODEL (10)
SOFTWARE (10)
SOFTWARE ARCHITECTURE (10)
VECTORS (10)
DATABASES (9)
MAGNETIC CORES (9)
MESSAGE SYSTEMS (9)
MICROPROCESSORS (9)
MULTICORE PROCESSING (9)
OPERATING SYSTEM KERNELS (9)
CENTRAL PROCESSING UNIT (8)
EMBEDDED SYSTEMS (8)
MACHINE LEARNING (8)
REAL TIME SYSTEMS (8)
STREAMING MEDIA (8)
ACCELERATORS (7)
CRYPTOGRAPHY (7)
DATA MODELS (7)
DATA STRUCTURES (7)
GRAPHIC PROCESSING UNIT (7)
LINEAR PROGRAMMING (7)
PROGRAM COMPILERS (7)
REAL-TIME SYSTEMS (7)
RESOURCE MANAGEMENT (7)
ANALYTICAL MODELS (6)
CLASSIFICATION ALGORITHMS (6)
COMPUTER LANGUAGES (6)
DRIVER CIRCUITS (6)
ELECTRONICS PACKAGING (6)
IMAGE PROCESSING (6)
MATHEMATICAL MODEL (6)
NVIDIA GPU (6)
OBJECT ORIENTED MODELING (6)
OPENACC (6)
OPERATING SYSTEMS (COMPUTERS) (6)
OPTIMISATION (6)
PARALLEL COMPUTING (6)
PIPELINE PROCESSING (6)
RECONFIGURABLE ARCHITECTURES (6)
SCHEDULES (6)
SECURITY (6)
SEMANTICS (6)
SEMIDEFINITE PROGRAMMING (6)
SYSTEM-ON-CHIP (6)
more

INFONA - science communication portal

Search results

A Comparative Study of SYCL, OpenCL, and OpenMP

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

DT-CGRA: Dual-track coarse-grained reconfigurable architecture for stream applications

Fast kernel fuzzy c-means algorithms based on difference of convex programming

GraVF: A vertex-centric distributed graph processing framework on FPGAs

FPGA-based accelerator design from a domain-specific language

Taming Big Data Scheduling with Locality-Aware Scheduling

Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning

An OpenCL-based framework for rapid virtual prototyping of heterogeneous architectures

CID: A Compile-Time Implementation Decider for Heterogeneous Platforms Based on C++ Attributes

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

Employing Compression Solutions under OpenACC

Vertex-Centric Graph Processing on FPGA

Bridging the Performance-Programmability Gap for FPGAs via OpenCL: A Case Study with OpenDwarfs

Research Report: Analysis of Software for Restricted Computational Environment Applicability

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Composable, parameterizable templates for high-level synthesis

Performance-centric scheduling with task migration for a heterogeneous compute node in the data center

A Component Based Graphical Parallel Programming Approach for Numerical Simulation Development

A comprehensive performance analysis of HSA and OpenCL 2.0

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options