Search results

Items from 1 to 16 out of 16 results

chapter

vPHI: Enabling Xeon Phi Capabilities in Virtual Machines

Stefanos Gerangelos, Nectarios Koziris

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1333 - 1340

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Heterogeneous processing has gained popularity in the high performancecomputing (HPC) area lately and it appears to have a great potential for future data centers. In this regard, accelerators, such as GPUs and Intel Xeon Phi, have already started to play a significant role in HPC systems offering a high degree of parallelism to application developers. Furthermore, hardware virtualization is gaining...

chapter

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Miguel Tasende

2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) > 894 - 897

The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal...

chapter

Reconfigurable coprocessors synthesis in the MPEG-RVC domain

Carlo Sau, Luca Fanni, Paolo Meloni, Luigi Raffo, more

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 8

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Flexibility and high efficiency are common design drivers in the embedded systems domain. Coarse-grained reconfigurable coprocessors can tackle these issues, but they suffer of complex design, debugging and applications mapping problems. In this paper, we propose an automated design flow that aids developers in design and managing coarse-grained reconfigurable coprocessors. It provides both the hardware...

chapter

VOCL-FT: introducing techniques for efficient soft error coprocessor recovery

Antonio J. Peña, Wesley Bland, Pavan Balaji

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Popular accelerator programming models rely on offloading computation operations and their corresponding data transfers to the coprocessors, leveraging synchronization points where needed. In this paper we identify and explore how such a programming model enables optimization opportunities not utilized in traditional checkpoint/restart systems, and we analyze them as the building blocks for an efficient...

chapter

Automatic Tuning of a Parallel Pattern Library for Heterogeneous Systems with Intel Xeon Phi

Jiri Dokulil, Siegfried Benkner

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications > 42 - 49

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

Pattern libraries are important tools for high productivity application development. Their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known during their creation. This makes pattern libraries good candidate for automatic software tuning. In this paper, we deal with automatic online parameter tuning of the HyPHI hybrid pattern...

chapter

A coarse-grained reconfigurable wavelet denoiser exploiting the Multi-Dataflow Composer tool

Nicola Carta, Carlo Sau, Francesca Palumbo, Danilo Pani, more

2013 Conference on Design and Architectures for Signal and Image Processing > 141 - 148

2013 Conference on Design and Architectures for Signal and Image Processing (DASIP)

In the last few years, efficient resource management turned out to be one of the major challenges for hardware designers. Strategies of reusability through reconfiguration have demonstrated interesting potentials to address it, providing also power and area minimization. The Multi-Dataflow Composer (MDC) tool has been presented to the scientific community to automatically build-up runtime coarse-grained...

chapter

GPU-S2S: A Compiler for Source-to-Source Translation on GPU

Dan Li, Haijun Cao, Xiaoshe Dong, Bao Zhang

2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming > 144 - 148

Third International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010)

CUDA facilitates the development of General Purpose computing on Graphics Processing Units (GPGPU), however, its complex memory system, thread-level structure, and data transmission control between memories have brought great challenges for programming on GPU. In order to facilitate the development of parallel programs on GPU and reuse existing sequential codes, in this paper we propose a novel directive...

chapter

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Ronald Babich, Michael A Clark, Balint Joó

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromo- dynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced...

chapter

Efficient template matching with variable size templates in CUDA

Nicholas Moore, Miriam Leeser, Laurie Smith King

2010 IEEE 8th Symposium on Application Specific Processors (SASP) > 77 - 80

2010 IEEE 8th Symposium on Application Specific Processors (SASP 2010)

Graphics processing units (GPUs) offer significantly higher peak performance than CPUs, but for a limited problem space. Even within this space, GPU solutions are often restricted to a set of specific problem instances or offer greatly varying performance for slightly different parameters. This makes providing a library of GPU implementations that is adaptable to arbitrary inputs a difficult task...

chapter

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

José Duato, Antonio J Peña, F Silla, R Mayo, more

2010 International Conference on High Performance Computing&Simulation > 224 - 231

2010 International Conference on High Performance Computing & Simulation (HPCS 2010)

The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable...

chapter

A GPU-based architecture for real-time data assessment at synchrotron experiments

S Chilingaryan, A Kopmann, A Mirone, T dos Santos Rolo

2010 17th IEEE-NPSS Real Time Conference > 1 - 8

2010 17th Real-Time Conference - IEEE-NPSS Technical Committee on Computer Applications in Nuclear and Plasma Sciences (RT 2010)

Current imaging experiments at synchrotron beam lines often lack a real-time data assessment. X-ray imaging cameras installed at synchrotron facilities like ANKA provide millions of pixels, each with a resolution of 12 bits or more, and take up to several thousand frames per second. A given experiment can produce data sets of multiple gigabytes in a few seconds. Up to now the data is stored in local...

chapter

Exploiting graphic processing units parallelism to improve intelligent data acquisition system performance in JET's correlation reflectometer

J Nieto, G de Arcas, J Vega, M Ruiz, more

2010 17th IEEE-NPSS Real Time Conference > 1 - 4

2010 17th Real-Time Conference - IEEE-NPSS Technical Committee on Computer Applications in Nuclear and Plasma Sciences (RT 2010)

The performance of intelligent data acquisition systems relies heavily on their processing capabilities and local bus bandwidth, especially in applications with high sample rates or high number of channels. This is the case of the self adaptive sampling rate data acquisition system installed as a pilot experiment in KG8B correlation reflectometer at JET. The system, which is based on the ITMS platform,...

chapter

Dense linear algebra solvers for multicore with GPU accelerators

Stanimire Tomov, Rajib Nath, Hatem Ltaief, Jack Dongarra

2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW) > 1 - 8

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010)

Solving dense linear systems of equations is a fundamental problem in scientific computing. Numerical simulations involving complex systems represented in terms of unknown variables and relations between them often lead to linear systems of equations that must be solved as fast as possible. We describe current efforts toward the development of these critical solvers in the area of dense linear algebra...

chapter

Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case

Abhijeet Gaikwad, Ioane Muni Toke

2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing > 607 - 614

18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010)

In many numerical applications resulting from computational science and engineering problems, the solution of sparse linear systems is the most prohibitively compute intensive task. Consequently, the linear solvers need to be carefully chosen and efficiently implemented in order to harness the available computing resources. Krylov subspace based iterative solvers have been widely used for solving...

chapter

GPU-based simulation of cellular neural networks for image processing

R. Dolan, G. DeSouza

2009 International Joint Conference on Neural Networks > 730 - 735

2009 International Joint Conference on Neural Networks (IJCNN 2009 - Atlanta)

The inherent massive parallelism of cellular neural networks makes them an ideal computational platform for kernel-based algorithms and image processing. General-purpose GPUs provide similar massive parallelism, but it can be difficult to design algorithms to make optimal use of the hardware. The presented research includes a GPU abstraction based on cellular neural networks. The abstraction offers...

chapter

Fast computation of general Fourier Transforms on GPUS

L.D. Brandon, C. Boyd, N. Govindaraju

2008 IEEE International Conference on Multimedia and Expo > 5 - 8

2008 IEEE International Conference on Multimedia and Expo (ICME)

We present an implementation of general FFTs for graphics processing units (GPUs). Unlike most existing GPU FFT implementations, we handle both complex and real data of any size that can fit in a texture. The basic building block for our algorithms is a radix-2 Stockham formulation of the FFT for power-of-two data sizes that avoids expensive bit reversals and exploits the high GPU memory band-width...

Filter options

Data set:
ieee
Keywords:
KERNEL
LIBRARIES
COPROCESSORS

Publication date

Set your own date range

Keywords

GRAPHICS PROCESSING UNIT (8)
COMPUTER GRAPHIC EQUIPMENT (7)
ACCELERATION (4)
HARDWARE (4)
RUNTIME (4)
CUDA (3)
GRAPHICS PROCESSING UNITS (3)
OPTIMIZATION (3)
BANDWIDTH (2)
COMPUTER ARCHITECTURE (2)
GPU (2)
GRAPHIC PROCESSING UNITS (2)
HIGH PERFORMANCE COMPUTING (2)
INSTRUCTION SETS (2)
MATHEMATICS COMPUTING (2)
PARALLEL PROCESSING (2)
PARALLEL PROGRAMMING (2)
SERVERS (2)
SPARSE MATRICES (2)
VIRTUALIZATION (2)
ALGORITHM DESIGN AND ANALYSIS (1)
ALGORITHM-SPECIFIC OPTIMIZATION (1)
ANKA (1)
APPLICATION PROGRAM INTERFACES (1)
APPROXIMATION METHODS (1)
ARCHITECTURE-SPECIFIC OPTIMIZATION (1)
ARRAYS (1)
ARTIFICIAL NEURAL NETWORKS (1)
AUTOMATIC MAPPING (1)
AUTOMATIC SOFTWARE TUNING (1)
AUTOMATIC SOURCE-TO-SOURCE TRANSLATION TOOL (1)
BENCHMARK TESTING (1)
BIG DATA (1)
BLAS (1)
BLIS (1)
BUFFER STORAGE (1)
C SEQUENTIAL CODE (1)
CELLULAR NEURAL NETS (1)
CELLULAR NEURAL NETWORKS (1)
CHECKPOINTING (1)
CHOLESKY FACTORIZATION (1)
CHROMA FRAMEWORK (1)
CLUSTERS (1)
COI (1)
COMPILER DIRECTIVE (1)
COMPLEX MEMORY SYSTEM (1)
COMPUTATIONAL ALGORITHMS (1)
COMPUTATIONAL FINANCE (1)
COMPUTATIONAL SCIENCE (1)
COMPUTER SCIENCE (1)
COMPUTERISED TOMOGRAPHY (1)
CONJUGATE GRADIENT SQUARED METHODS (1)
CONTEXT (1)
CORRELATION REFLECTOMETER (1)
CPU (1)
CUDA CODE (1)
DAQ SYSTEM (1)
DATA ACQUISITION (1)
DATA COMMUNICATION (1)
DATA MINING (1)
DATA QUALITY (1)
DATA TRANSMISSION CONTROL (1)
DATA VISUALISATION (1)
DATABASES (1)
DENSE LINEAR ALGEBRA SOLVERS (1)
DIGITAL SIMULATION (1)
DIRECTIVE BASED COMPILER GUIDED APPROACH (1)
DIRECTX9 API (1)
DISCRETE FOURIER TRANSFORMS (1)
DRIVER CIRCUITS (1)
EDUCATIONAL INSTITUTIONS (1)
EFFICIENT TEMPLATE MATCHING (1)
ENERGY CONSUMPTION (1)
ENERGY SAVING (1)
ERROR CORRECTION CODES (1)
FAST FOURIER TRANSFORMS (1)
FFT (1)
FINANCIAL DATA PROCESSING (1)
FINANCIAL ENGINEERING PROBLEMS (1)
FINITE IMPULSE RESPONSE FILTER (1)
GENERAL FOURIER TRANSFORMS (1)
GENERAL PURPOSE COMPUTING (1)
GENERAL PURPOSE GRAPHICS PROCESSING UNITS (1)
GPGPU (1)
GPU ABSTRACTION (1)
GPU ACCELERATORS (1)
GPU DEVICES (1)
GPU FFT IMPLEMENTATION (1)
GPU MEMORY (1)
GPU-BASED ACCELERATORS (1)
GPU-BASED ARCHITECTURE (1)
GPU-BASED SIMULATION (1)
GPU-S2S (1)
GRAPHICS (1)
GRAPHICS HARDWARE (1)
HARDWARE DESIGN LANGUAGES (1)
HIGH PERFORMANCE CLUSTERS (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options