Search results

chapter

Sparse matrix assembly on the GPU through multiplication patterns

Rhaleb Zayer, Markus Steinberger, Hans-Peter Seidel

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 8

2017 IEEE High Performance Extreme Computing Conference (HPEC)

The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can...

chapter

Cryptanalysis on GPUs with the Cube Attack: Design, Optimization and Performances Gains

Marco Cianfriglia, Stefano Guarino

2017 International Conference on High Performance Computing & Simulation (HPCS) > 753 - 760

2017 International Conference on High Performance Computing & Simulation (HPCS)

The cube attack is a flexible cryptanalysis technique, with a simple and fascinating theoretical implant. It combines offline exhaustive searches over selected tweakable public/IV bits (the sides of the “cube“), with an online key-recovery phase. Although virtually applicable to any cipher, and generally praised by the research community, the real potential of the attack is still in question, and...

article

Disaggregation and Sharing of I/O Devices in Cloud Data Centers

Jun Suzuki, Yoichi Hidaka, Junichi Higuchi, Yuki Hayashi, more

IEEE Transactions on Computers > 2016 > 65 > 10 > 3013 - 3026

Input/output (I/O) devices such as a graphics processing unit and a solid-state drive are inserted into I/O slots of a host in data center platforms. With this sort of configuration the I/O devices are used exclusively by the host with resultant inefficient resource usage. In addition, the maximum number of I/O devices that can be assigned to each host is limited by the number of its I/O slots. This...

chapter

Performance of parallel ChaCha20 stream cipher

Radu Velea, Florina Gurzau, Laurentiu Margarit, Ion Bica, more

2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI) > 391 - 396

2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI)

ChaCha20 is an encryption cipher selected by Google to replace the now obsolete RC4 in the Chrome browser and Android devices. The current article discusses the performance implications of parallelizing ChaCha20 across multicore CPU and GPU. The serial implementation used to derive the parallel code is part of BoringSSL encryption library. We used OpenMP and OpenCL to accelerate the cipher and obtain...

chapter

Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

Srividya Ramanathan, Gautam Hazari, Kanishka Lahiri, Francesco Spadini

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 204 - 213

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications...

chapter

Implementation of Gaussian and Box Kernel Based Approximation of Bilateral Filter Using OpenCL

Honey Gupta, Daniel Sanju Antony, Rathna G. N.

2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA) > 1 - 5

2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

A Bilateral filter is basically an edge-preserving and smoothing, non-linear filter. It consists of two kernels, namely spatial and range kernels which can be constant or arbitrary. Algorithms for bilateral filtering with constant time computational complexity are present today, but their execution time is too high for real time applications. Also, hardware latency and throughput sometimes reduce...

chapter

Replicating the Performance Evaluation of an N-Body Application on a Manycore Accelerator

Vinicius Garcia Pinto, Vinicius Alves Herbstrith, Lucas Mello Schnorr

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 19 - 24

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

Reproducibility for High Performance Computing (HPC) systems has been discussed for some time already, but more work should be carried out to cover the latest accelerators that equip the fastest supercomputers such as the ones listed in Top500. In this paper, we perform a replication of a performance evaluation carried out using an N-Body Open MP parallel application on a XeonPhi accelerator. We also...

chapter

A Load-Distributed Linpack Implementation for Heterogeneous Clusters

David Rohr, Volker Lindenstruth

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 436 - 443

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

In recent years, heterogeneous HPC systems, whichcombine traditional processors with accelerator cards such as GPUs, have been shown to deliver superior performance and power efficiency. Since different scientific problems pose different demands on the computer architecture, some general purpose supercomputers consist of different types of nodes, where each type is suited best for certain applications...

chapter

Development of Scientific Software for HPC Architectures Using Open ACC: The Case of LQCD

Claudio Bonati, Enrico Calore, Simone Coscetti, Massimo D'elia, more

2015 IEEE/ACM 1st International Workshop on Software Engineering for High Performance Computing in Science > 9 - 15

2015 IEEE/ACM 1st International Workshop on Software Engineering for High Performance Computing in Science (SE4HPCS)

Many scientific software applications, that solve complex compute-or data-intensive problems, such as large parallel simulations of physics phenomena, increasingly use HPC systems in order to achieve scientifically relevant results. An increasing number of HPC systems adopt heterogeneous node architectures, combining traditional multi-core CPUs with energy-efficient massively parallel accelerators,...

chapter

Computation-to-core mapping strategies for iso-surface volume rendering on GPUs

Junpeng Wang, Fei Yang, Yong Cao

2015 IEEE Pacific Visualization Symposium (PacificVis) > 153 - 157

2015 IEEE Pacific Visualization Symposium (PacificVis)

Ray casting algorithm is a major component of the direct volume rendering, which exhibits inherent parallelism, making it suitable for graphics processing units (GPUs). However, blindly mapping the ray casting algorithm on a GPU's complex parallel architecture can result in a magnitude of performance loss. In this paper, a novel computation-to-core mapping strategy, called Warp Marching, for the texture-based...

chapter

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs

Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

A low-rank approximation of a dense matrix plays an important role in many applications. To compute such an approximation, a common approach uses the QR factorization with column pivoting (QRCP). Though the reliability and efficiency of QRCP have been demonstrated, this deterministic approach requires costly communication at each step of the factorization. Since such communication is becoming increasingly...

chapter

Real-time pedestrian detectionl Using OpenCL

Rong Sun, Xuzhi Wang, Xuannan Ye

2014 International Conference on Audio, Language and Image Processing > 401 - 404

2014 International Conference on Audio, Language and Image Processing (ICALIP)

Pedestrian detection is a challenging task, due to wide variety of appearances, especially in complex real world scenes. The use of real-time pedestrian detection is of great use for a broad range of applications in multiple domains, such as surveillance and Intelligent Transportation System. In this paper we present a fast implementation of a robust pedestrian detector by using OpenCL, which is a...

chapter

The GAP project - GPU for realtime applications in high energy physics and medical imaging

R. Ammendola, M. Bauce, A. Biagioni, R. Fantechi, more

2013 IEEE Nuclear Science Symposium and Medical Imaging Conference (2013 NSS/MIC) > 1 - 7

2013 IEEE Nuclear Science Symposium and Medical Imaging Conference (2013 NSS/MIC)

We describe a pilot project for the use of GPUs (Graphics Processing Units) in online triggering applications for high energy physics experiments. Two major trends can be identified in the development of trigger and DAQ systems for particle physics experiments: the massive use of general-purpose commodity systems for data acquisition, such as commercial multicore PC farms, and the reduction of trigger...

chapter

Acceleration of a physical-optics simulator using CUDA

Sebastian Hegler, Mantvydas Kalibatas, Marco Mutze, Christoph Statz, more

CEM'13 Computational Electromagnetics International Workshop > 42 - 44

2013 Computational Electromagnetics Workshop (CEM)

This paper reports on the current state of a work-in-progress porting of a physical-optics simulation tool onto NVIDIA's CUDA platform. Current accelerator APIs are shortly presented. Our choice for the CUDA platform is explained, as well as the data flow of the simulation tool. The current state of the implementation of the port is presented, as are first run time measurements. The results are promising;...

chapter

Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis

Adam Betts, Alastair Donaldson

2013 25th Euromicro Conference on Real-Time Systems > 193 - 202

2013 25th Euromicro Conference on Real-Time Systems (ECRTS)

The massive parallelism offered by Graphics Processing Units (GPUs) is now routinely exploited to accelerate computationally intensive tasks in a wide variety of application domains. Efficient GPU programming in languages such as CUDA and OpenCL requires careful application of hand optimisations to exploit parallelism and locality while minimising synchronisation. The effectiveness of such optimisations...

chapter

Heterogeneous parallel facilities

John Sanders

2013 Science and Information Conference > 182 - 187

2013 Science and Information Conference (SAI)

The paper describes heterogeneous parallel processing as a feature of hardware devices. Software supports the configuration of the hardware components and a new kind of system-software supports the distribution of data and the scheduling of tasks. The concept is supported by referring to the relatively recent Open Systems specification, OpenCL. This is briefly described and its likely evolution surmised...

INFONA - science communication portal

Search results

Sparse matrix assembly on the GPU through multiplication patterns

Cryptanalysis on GPUs with the Cube Attack: Design, Optimization and Performances Gains

Disaggregation and Sharing of I/O Devices in Cloud Data Centers

Performance of parallel ChaCha20 stream cipher

Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

Implementation of Gaussian and Box Kernel Based Approximation of Bilateral Filter Using OpenCL

Replicating the Performance Evaluation of an N-Body Application on a Manycore Accelerator

A Load-Distributed Linpack Implementation for Heterogeneous Clusters

Development of Scientific Software for HPC Architectures Using Open ACC: The Case of LQCD

Computation-to-core mapping strategies for iso-surface volume rendering on GPUs

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs

Real-time pedestrian detectionl Using OpenCL

The GAP project - GPU for realtime applications in high energy physics and medical imaging

Acceleration of a physical-optics simulator using CUDA

Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis

Heterogeneous parallel facilities

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options