Search results

Items from 1 to 20 out of 398 results

chapter

Aggressive pipelining of irregular applications on reconfigurable hardware

Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 575 - 586

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

CPU-FPGA heterogeneous platforms offer a promising solution for high-performance and energy-efficient computing systems by providing specialized accelerators with post-silicon reconfigurability. To unleash the power of FPGA, however, the programmability gap has to be filled so that applications specified in high-level programming languages can be efficiently mapped and scheduled on FPGA. The above...

chapter

Girls Who . . . Do Scratch a First Round with the Essence Kernel

Cassandra Balland, Nene Satorou Cisse, Louise Hergoualch, Gwendoline Kervot, more

2017 IEEE 30th Conference on Software Engineering Education and Training (CSEE&T) > 251 - 255

2017 IEEE 30th Conference on Software Engineering Education and Training (CSEE&T)

"Girls who..." is an education system belonging to the French national program "Accompanying in Science and Technology in the Primary School" (ASTEP). "Girls who..." is a girl network that develops and maintains an facility called the factory, addressing a double goal: setting an example of science performed by women and foster science and technology in elementary schools...

chapter

Introducing parallel computing concepts in computer system related courses

Han Wan, Xiaopeng Gao, Xiang Long, Bo Jiang

2017 IEEE Frontiers in Education Conference (FIE) > 1 - 7

2017 IEEE Frontiers in Education Conference (FIE)

All semiconductor market domains are converging to concurrent platforms. This trend has certainly led real challenge to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals. This paper argues that the Computer System related courses are natural places to introduce the parallelism, and the earlier to parallel computing concepts...

chapter

A programming model and runtime system for approximation-aware heterogeneous computing

Ioannis Parnassos, Nikolaos Bellas, Nikolaos Katsaros, Nikolaos Patsiatzis, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Heterogeneous platforms that include diverse architectures such as multicore CPUs, FPGAs and GPUs are becoming very popular due to their superior performance and energy efficiency. Besides heterogeneity, a promising approach for minimizing energy consumption is through approximate computing which relaxes the requirement that all parts of a program are considered equally important to the output quality,...

chapter

Evaluating high-level design strategies on FPGAs for high-performance computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Field-Programmable Gate Arrays (FPGAs) are gaining considerable momentum in mainstream high-performance systems in recent years due to their flexibility and low power consumption. Still, FPGAs remain largely unavailable to software programmers due to programming and debugging difficulties that are inherent to standard Hardware Description Languages. The performance that hardware-oblivious software...

chapter

Evaluating high-level design strategies on FPGAs for high-performance computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

chapter

A GPU-Friendly Skiplist Algorithm

Nurit Moscovici, Nachshon Cohen, Erez Petrank

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 246 - 259

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

We propose a design for a fine-grained lock-based skiplist optimized for Graphics Processing Units (GPUs). While GPUs are often used to accelerate streaming parallel computations, it remains a significant challenge to efficiently offload concurrent computations with more complicated data-irregular access and fine-grained synchronization. Natural building blocks for such computations would be concurrent...

chapter

A hyper-parameter estimation algorithm in kernel based regularization approach for system identification using Kautz kernels

Takaaki Kondo, Yoshito Ohta

2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE) > 599 - 601

2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)

A Bayesian approach for system identification using kernel functions is a popular method. The kernel functions are considered as certain prior knowledge about a target system, so selecting proper kernels is required. Recent studies show that it is successful to use OBF-s(orthonormal basis function)-based kernels as the kernel functions, but estimating hyper-parameters of the kernel functions is a...

chapter

3D tomography back-projection parallelization on FPGAs using opencl

Maxime Martelli, Nicolas Gag, Alain Merigot, Cyrille Enderli

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 1 - 6

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a...

chapter

UDORN: A design framework of persistent in-memory key-value database for NVM

Xianzhang Chen, Edwin H.-M. Sha, Ahmad Abdullah, Qingfeng Zhuge, more

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA) > 1 - 6

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

Emerging non-volatile memory (NVM) technologies provide opportunities to improve the performance of key-value databases (KVDBs) by deploying database on NVM. However, existing in-memory KVDBs cannot fully exploit the advantages of NVM. They process data on in-memory database and store an image on persistent storage via an underlying file system. The performance of database operations is degraded by...

chapter

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, more

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

chapter

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor

James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, more

2017 46th International Conference on Parallel Processing (ICPP) > 432 - 441

2017 46th International Conference on Parallel Processing (ICPP)

The home-grown SW26010 many-core processor enabled the production of China’s first independently developed number-one ranked supercomputer – the Sunway TaihuLight. The design of the limited off-chip memory bandwidth, however, renders the SW26010 a highly memory-bound processor. To compensate for this limitation, the processor was designed with a unique hardware feature, "Register Level Communication"...

chapter

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Lin-Ya Yu, Shao-Chung Wang, Jenq-Kuen Lee

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 45 - 52

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

Heterogeneous computing platforms containing a wide range of computing resources from CPUs to specialized hardware accelerators is the trend today resulting from the physical limitations on processors speed and the increasing demand for computing performance. Hence many optimization strategies are studied to get better throughput and lower energy consumption in heterogeneous systems. Various memory...

chapter

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Michael Wagner, Victor Lopez, Julian Morillo, Carlo Cavazzoni, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 243 - 250

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping...

chapter

Overlapping Data Transfers with Computation on GPU with Tiles

Burak Bastem, Didem Unat, Weiqun Zhang, Ann Almgren, more

2017 46th International Conference on Parallel Processing (ICPP) > 171 - 180

2017 46th International Conference on Parallel Processing (ICPP)

GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...

chapter

A pipeline functional language for stateful packet processing

Nicola Bonelli, Stefano Giordano, Gregorio Procissi

2017 IEEE Conference on Network Softwarization (NetSoft) > 1 - 4

2017 IEEE Conference on Network Softwarization (NetSoft)

The evolution of commodity PCs towards multi-core processing platforms equipped with high-speed network interfaces makes them reasonable and cost effective targets for the implementation of generic network functions. In addition, the availability of software accelerated I/O frameworks provides a convenient ground for running a broad variety of applications, from simple software switches to more complex...

chapter

OpenMP device offloading to FPGA accelerators

Lukas Sommer, Jens Korinth, Andreas Koch

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 201 - 205

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Future high-performance computing systems will need to include multiple specialized accelerators in a single heterogeneous system to overcome power-density limitations of CPU performance.

article

The Unicorn Runtime: Efficient Distributed Shared Memory Programming for Hybrid CPU-GPU Clusters

Tarun Beri, Sorav Bansal, Subodh Kumar

IEEE Transactions on Parallel and Distributed Systems > 2017 > 28 > 5 > 1518 - 1534

Programming hybrid CPU-GPU clusters is hard. This paper addresses this difficulty and presents the design and runtime implementation of <bold/><bold>Unicorn</bold><bold/>—a parallel programming model for hybrid CPU-GPU clusters. In particular, this paper proves that efficient distributed shared memory style programing is possible and its simplicity can be retained across CPUs...

chapter

Publish-subscribe programming for a NoC-based multiprocessor system-on-chip

Jean Carlo Hamerski, Geancarlo Abich, Ricardo Reis, Luciano Ost, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Shared memory and message passing are traditional parallel programming models used on multiprocessor system-on-chip environments. Underlying models are traditionally meant for static scenarios where all communicating entities and their intercommunication patterns are known a priori by the software engineer. The systems design following such programming models became complex due to dynamic behavior...

chapter

Enabling One-Sided Communication Semantics on ARM

Pavel Shamis, M. Graham Lopez, Gilad Shainer

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 805 - 813

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we present our work to enable optimized one-sided communication operations on the ARM v8 architecture using a high-performance InfiniBand network interconnect, as well as an evaluation of our implementation. For this study, we started with an OpenSHMEM implementation based on Open MPI/SHMEM, and combined it with the UCX framework and the XPMEM kernel extension for shared memory communication...

Data set:
ieee
Keywords:
KERNEL
PROGRAMMING

Publication date

Set your own date range

Content availability

Available (395)
None (3)

Publication type

book (372)
article (26)

Keywords

GRAPHICS PROCESSING UNITS (112)
COMPUTER ARCHITECTURE (88)
HARDWARE (86)
COMPUTATIONAL MODELING (79)
INSTRUCTION SETS (64)
OPTIMIZATION (63)
PARALLEL PROCESSING (60)
GRAPHICS PROCESSING UNIT (55)
GPU (53)
CUDA (45)
OPENCL (42)
PROGRAM PROCESSORS (40)
RUNTIME (40)
COPROCESSORS (38)
COMPUTER GRAPHIC EQUIPMENT (37)
FIELD PROGRAMMABLE GATE ARRAYS (35)
ARRAYS (32)
PERFORMANCE EVALUATION (32)
REGISTERS (32)
BENCHMARK TESTING (31)
LIBRARIES (31)
SYNCHRONIZATION (29)
PARALLEL PROGRAMMING (26)
LINUX (25)
ALGORITHM DESIGN AND ANALYSIS (24)
GPGPU (24)
MEMORY MANAGEMENT (24)
DATA MINING (19)
OPENMP (19)
YARN (19)
SUPPORT VECTOR MACHINES (18)
BANDWIDTH (17)
COMPUTER GRAPHICS (16)
RANDOM ACCESS MEMORY (16)
APPLICATION PROGRAM INTERFACES (15)
GRAPHICS (15)
HIGH PERFORMANCE COMPUTING (15)
MICROPROCESSOR CHIPS (15)
MULTIPROCESSING SYSTEMS (15)
STANDARDS (14)
ACCELERATION (13)
COMPUTE UNIFIED DEVICE ARCHITECTURE (13)
CONTEXT (13)
FPGA (13)
JAVA (13)
MPI (13)
PARALLEL ARCHITECTURES (13)
OPERATING SYSTEMS (12)
VECTORS (12)
COMPLEXITY THEORY (11)
DATA TRANSFER (11)
INDEXES (11)
MAGNETIC CORES (11)
MESSAGE PASSING (11)
MULTI-THREADING (11)
REAL TIME SYSTEMS (11)
SERVERS (11)
MESSAGE SYSTEMS (10)
MULTICORE PROCESSING (10)
PROGRAMMING MODEL (10)
SOFTWARE (10)
SOFTWARE ARCHITECTURE (10)
CENTRAL PROCESSING UNIT (9)
DATABASES (9)
EMBEDDED SYSTEMS (9)
MACHINE LEARNING (9)
MICROPROCESSORS (9)
OPERATING SYSTEM KERNELS (9)
ACCELERATORS (8)
DATA STRUCTURES (8)
OBJECT ORIENTED MODELING (8)
PROGRAM COMPILERS (8)
REAL-TIME SYSTEMS (8)
STREAMING MEDIA (8)
ANALYTICAL MODELS (7)
CRYPTOGRAPHY (7)
DATA MODELS (7)
GRAPHIC PROCESSING UNIT (7)
LINEAR PROGRAMMING (7)
PARALLEL COMPUTING (7)
RESOURCE MANAGEMENT (7)
SCHEDULES (7)
SCHEDULING (7)
SEMANTICS (7)
SEMIDEFINITE PROGRAMMING (7)
SYSTEM-ON-CHIP (7)
TRAINING (7)
C LANGUAGE (6)
CLASSIFICATION ALGORITHMS (6)
COMPUTER LANGUAGES (6)
CUDA PROGRAMMING MODEL (6)
DRIVER CIRCUITS (6)
EDUCATION (6)
EDUCATIONAL INSTITUTIONS (6)
ELECTRONICS PACKAGING (6)
IMAGE PROCESSING (6)
MATHEMATICAL MODEL (6)
NVIDIA GPU (6)
more

INFONA - science communication portal

Search results

Aggressive pipelining of irregular applications on reconfigurable hardware

Girls Who . . . Do Scratch a First Round with the Essence Kernel

Introducing parallel computing concepts in computer system related courses

A programming model and runtime system for approximation-aware heterogeneous computing

Evaluating high-level design strategies on FPGAs for high-performance computing

Evaluating high-level design strategies on FPGAs for high-performance computing

A GPU-Friendly Skiplist Algorithm

A hyper-parameter estimation algorithm in kernel based regularization approach for system identification using Kautz kernels

3D tomography back-projection parallelization on FPGAs using opencl

UDORN: A design framework of persistent in-memory key-value database for NVM

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Overlapping Data Transfers with Computation on GPU with Tiles

A pipeline functional language for stateful packet processing

OpenMP device offloading to FPGA accelerators

The Unicorn Runtime: Efficient Distributed Shared Memory Programming for Hybrid CPU-GPU Clusters

Publish-subscribe programming for a NoC-based multiprocessor system-on-chip

Enabling One-Sided Communication Semantics on ARM

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options