Wyniki wyszukiwania

Pozycje od 1 do 20 spośród 372 wyników

Poprzednia

Następna

rozdział

Aggressive pipelining of irregular applications on reconfigurable hardware

Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, więcej

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 575 - 586

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

CPU-FPGA heterogeneous platforms offer a promising solution for high-performance and energy-efficient computing systems by providing specialized accelerators with post-silicon reconfigurability. To unleash the power of FPGA, however, the programmability gap has to be filled so that applications specified in high-level programming languages can be efficiently mapped and scheduled on FPGA. The above...

rozdział

Girls Who . . . Do Scratch a First Round with the Essence Kernel

Cassandra Balland, Nene Satorou Cisse, Louise Hergoualch, Gwendoline Kervot, więcej

2017 IEEE 30th Conference on Software Engineering Education and Training (CSEE&T) > 251 - 255

2017 IEEE 30th Conference on Software Engineering Education and Training (CSEE&T)

"Girls who..." is an education system belonging to the French national program "Accompanying in Science and Technology in the Primary School" (ASTEP). "Girls who..." is a girl network that develops and maintains an facility called the factory, addressing a double goal: setting an example of science performed by women and foster science and technology in elementary schools...

rozdział

Introducing parallel computing concepts in computer system related courses

Han Wan, Xiaopeng Gao, Xiang Long, Bo Jiang

2017 IEEE Frontiers in Education Conference (FIE) > 1 - 7

2017 IEEE Frontiers in Education Conference (FIE)

All semiconductor market domains are converging to concurrent platforms. This trend has certainly led real challenge to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals. This paper argues that the Computer System related courses are natural places to introduce the parallelism, and the earlier to parallel computing concepts...

rozdział

A programming model and runtime system for approximation-aware heterogeneous computing

Ioannis Parnassos, Nikolaos Bellas, Nikolaos Katsaros, Nikolaos Patsiatzis, więcej

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Heterogeneous platforms that include diverse architectures such as multicore CPUs, FPGAs and GPUs are becoming very popular due to their superior performance and energy efficiency. Besides heterogeneity, a promising approach for minimizing energy consumption is through approximate computing which relaxes the requirement that all parts of a program are considered equally important to the output quality,...

rozdział

Evaluating high-level design strategies on FPGAs for high-performance computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Field-Programmable Gate Arrays (FPGAs) are gaining considerable momentum in mainstream high-performance systems in recent years due to their flexibility and low power consumption. Still, FPGAs remain largely unavailable to software programmers due to programming and debugging difficulties that are inherent to standard Hardware Description Languages. The performance that hardware-oblivious software...

rozdział

Evaluating high-level design strategies on FPGAs for high-performance computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

rozdział

A GPU-Friendly Skiplist Algorithm

Nurit Moscovici, Nachshon Cohen, Erez Petrank

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 246 - 259

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

We propose a design for a fine-grained lock-based skiplist optimized for Graphics Processing Units (GPUs). While GPUs are often used to accelerate streaming parallel computations, it remains a significant challenge to efficiently offload concurrent computations with more complicated data-irregular access and fine-grained synchronization. Natural building blocks for such computations would be concurrent...

rozdział

A hyper-parameter estimation algorithm in kernel based regularization approach for system identification using Kautz kernels

Takaaki Kondo, Yoshito Ohta

2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE) > 599 - 601

2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)

A Bayesian approach for system identification using kernel functions is a popular method. The kernel functions are considered as certain prior knowledge about a target system, so selecting proper kernels is required. Recent studies show that it is successful to use OBF-s(orthonormal basis function)-based kernels as the kernel functions, but estimating hyper-parameters of the kernel functions is a...

rozdział

3D tomography back-projection parallelization on FPGAs using opencl

Maxime Martelli, Nicolas Gag, Alain Merigot, Cyrille Enderli

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 1 - 6

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a...

rozdział

UDORN: A design framework of persistent in-memory key-value database for NVM

Xianzhang Chen, Edwin H.-M. Sha, Ahmad Abdullah, Qingfeng Zhuge, więcej

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA) > 1 - 6

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

Emerging non-volatile memory (NVM) technologies provide opportunities to improve the performance of key-value databases (KVDBs) by deploying database on NVM. However, existing in-memory KVDBs cannot fully exploit the advantages of NVM. They process data on in-memory database and store an image on persistent storage via an underlying file system. The performance of database operations is degraded by...

rozdział

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, więcej

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

rozdział

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor

James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, więcej

2017 46th International Conference on Parallel Processing (ICPP) > 432 - 441

2017 46th International Conference on Parallel Processing (ICPP)

The home-grown SW26010 many-core processor enabled the production of China’s first independently developed number-one ranked supercomputer – the Sunway TaihuLight. The design of the limited off-chip memory bandwidth, however, renders the SW26010 a highly memory-bound processor. To compensate for this limitation, the processor was designed with a unique hardware feature, "Register Level Communication"...

rozdział

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Lin-Ya Yu, Shao-Chung Wang, Jenq-Kuen Lee

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 45 - 52

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

Heterogeneous computing platforms containing a wide range of computing resources from CPUs to specialized hardware accelerators is the trend today resulting from the physical limitations on processors speed and the increasing demand for computing performance. Hence many optimization strategies are studied to get better throughput and lower energy consumption in heterogeneous systems. Various memory...

rozdział

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Michael Wagner, Victor Lopez, Julian Morillo, Carlo Cavazzoni, więcej

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 243 - 250

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping...

rozdział

Overlapping Data Transfers with Computation on GPU with Tiles

Burak Bastem, Didem Unat, Weiqun Zhang, Ann Almgren, więcej

2017 46th International Conference on Parallel Processing (ICPP) > 171 - 180

2017 46th International Conference on Parallel Processing (ICPP)

GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...

rozdział

A pipeline functional language for stateful packet processing

Nicola Bonelli, Stefano Giordano, Gregorio Procissi

2017 IEEE Conference on Network Softwarization (NetSoft) > 1 - 4

2017 IEEE Conference on Network Softwarization (NetSoft)

The evolution of commodity PCs towards multi-core processing platforms equipped with high-speed network interfaces makes them reasonable and cost effective targets for the implementation of generic network functions. In addition, the availability of software accelerated I/O frameworks provides a convenient ground for running a broad variety of applications, from simple software switches to more complex...

rozdział

OpenMP device offloading to FPGA accelerators

Lukas Sommer, Jens Korinth, Andreas Koch

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 201 - 205

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Future high-performance computing systems will need to include multiple specialized accelerators in a single heterogeneous system to overcome power-density limitations of CPU performance.

rozdział

Publish-subscribe programming for a NoC-based multiprocessor system-on-chip

Jean Carlo Hamerski, Geancarlo Abich, Ricardo Reis, Luciano Ost, więcej

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Shared memory and message passing are traditional parallel programming models used on multiprocessor system-on-chip environments. Underlying models are traditionally meant for static scenarios where all communicating entities and their intercommunication patterns are known a priori by the software engineer. The systems design following such programming models became complex due to dynamic behavior...

rozdział

Enabling One-Sided Communication Semantics on ARM

Pavel Shamis, M. Graham Lopez, Gilad Shainer

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 805 - 813

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we present our work to enable optimized one-sided communication operations on the ARM v8 architecture using a high-performance InfiniBand network interconnect, as well as an evaluation of our implementation. For this study, we started with an OpenSHMEM implementation based on Open MPI/SHMEM, and combined it with the UCX framework and the XPMEM kernel extension for shared memory communication...

rozdział

Directive-Based Partitioning and Pipelining for Graphics Processing Units

Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-chun Feng

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 575 - 584

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The community needs simpler mechanisms to access the performance available in accelerators, such as GPUs, FPGAs, and APUs, due to their increasing use in state-of-the-art supercomputers. Programming models like CUDA, OpenMP, OpenACC and OpenCL can efficiently offload compute-intensive workloads to these devices. By default these models naively offload computation without overlapping it with communication...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
KERNEL
PROGRAMMING

Data publikacji

Ustaw własny zakres dat

Dostępność treści

Dostępna (369)
Brak (3)

Słowa kluczowe

GRAPHICS PROCESSING UNITS (104)
HARDWARE (82)
COMPUTER ARCHITECTURE (81)
COMPUTATIONAL MODELING (73)
INSTRUCTION SETS (62)
OPTIMIZATION (57)
PARALLEL PROCESSING (56)
GRAPHICS PROCESSING UNIT (55)
GPU (51)
CUDA (43)
OPENCL (39)
COMPUTER GRAPHIC EQUIPMENT (36)
COPROCESSORS (36)
PROGRAM PROCESSORS (36)
RUNTIME (35)
FIELD PROGRAMMABLE GATE ARRAYS (34)
ARRAYS (30)
LIBRARIES (30)
PERFORMANCE EVALUATION (30)
BENCHMARK TESTING (29)
REGISTERS (29)
SYNCHRONIZATION (26)
PARALLEL PROGRAMMING (25)
ALGORITHM DESIGN AND ANALYSIS (24)
LINUX (24)
MEMORY MANAGEMENT (23)
GPGPU (22)
DATA MINING (19)
OPENMP (18)
YARN (18)
BANDWIDTH (16)
COMPUTER GRAPHICS (16)
SUPPORT VECTOR MACHINES (16)
APPLICATION PROGRAM INTERFACES (15)
HIGH PERFORMANCE COMPUTING (15)
MULTIPROCESSING SYSTEMS (15)
MICROPROCESSOR CHIPS (14)
ACCELERATION (13)
CONTEXT (13)
FPGA (13)
GRAPHICS (13)
MPI (13)
PARALLEL ARCHITECTURES (13)
RANDOM ACCESS MEMORY (13)
JAVA (12)
STANDARDS (12)
COMPLEXITY THEORY (11)
COMPUTE UNIFIED DEVICE ARCHITECTURE (11)
DATA TRANSFER (11)
INDEXES (11)
MULTI-THREADING (11)
SERVERS (11)
MESSAGE PASSING (10)
OPERATING SYSTEMS (10)
PROGRAMMING MODEL (10)
SOFTWARE (10)
SOFTWARE ARCHITECTURE (10)
VECTORS (10)
DATABASES (9)
MAGNETIC CORES (9)
MESSAGE SYSTEMS (9)
MICROPROCESSORS (9)
MULTICORE PROCESSING (9)
OPERATING SYSTEM KERNELS (9)
CENTRAL PROCESSING UNIT (8)
EMBEDDED SYSTEMS (8)
MACHINE LEARNING (8)
REAL TIME SYSTEMS (8)
STREAMING MEDIA (8)
ACCELERATORS (7)
CRYPTOGRAPHY (7)
DATA MODELS (7)
DATA STRUCTURES (7)
GRAPHIC PROCESSING UNIT (7)
LINEAR PROGRAMMING (7)
PROGRAM COMPILERS (7)
REAL-TIME SYSTEMS (7)
RESOURCE MANAGEMENT (7)
ANALYTICAL MODELS (6)
CLASSIFICATION ALGORITHMS (6)
COMPUTER LANGUAGES (6)
DRIVER CIRCUITS (6)
ELECTRONICS PACKAGING (6)
IMAGE PROCESSING (6)
MATHEMATICAL MODEL (6)
NVIDIA GPU (6)
OBJECT ORIENTED MODELING (6)
OPENACC (6)
OPERATING SYSTEMS (COMPUTERS) (6)
OPTIMISATION (6)
PARALLEL COMPUTING (6)
PIPELINE PROCESSING (6)
RECONFIGURABLE ARCHITECTURES (6)
SCHEDULES (6)
SECURITY (6)
SEMANTICS (6)
SEMIDEFINITE PROGRAMMING (6)
SYSTEM-ON-CHIP (6)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu