Search results

Items from 41 to 60 out of 594 results

chapter

Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU

Weizhi Xu, Hao Zhang, Shuai Jiao, Da Wang, more

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 231 - 235

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD)

It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding...

chapter

Parallel one- and two-dimensional FFTs on GPGPUs

Mehrdad Fallahpour, Chang-Hong Lin, Ming-Bo Lin, Chin-Yu Chang

Anti-counterfeiting, Security, and Identification > 1 - 5

2012 International Conference on Anti-Counterfeiting, Security and Identification (2012 ASID)

This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4...

chapter

Preemption of a CUDA Kernel Function

Jon Calhoun, Hai Jiang

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 247 - 252

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD)

As graphics processing units (GPUs) gain adoption as general purpose parallel compute devices, several key problems need to be addressed in order for their use to become more practical and more user friendly. One such problem is special functions designed to execute on GPUs called kernel functions are non-preempt able. Once the kernel is issued to the GPU it will remain there till either execution...

chapter

The Fat-Link Computation on Large GPU Clusters for Lattice QCD

Guochun Shi, Ronald Babich, Michael A. Clark, B'lint Joo, more

2012 Symposium on Application Accelerators in High Performance Computing > 1 - 10

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

Graphics Processing Units (GPU) are becoming increasingly popular in high performance computing due to their high performance, high power efficiency and low cost. In this paper, we present results of an effort to implement the fatlink computation -- an important component of many lattice quantum chromo dynamics (LQCD) calculations -- on GPU clusters using the QUDA framework. Two implementations, one...

chapter

Automatically Optimized GPU Acceleration of Element Subroutines in Finite Element Method

Jirí Filipovic, Jan Fousek, Bedrich Lakomy, Matú Madzin

2012 Symposium on Application Accelerators in High Performance Computing > 141 - 144

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

The element subroutines in finite element method (FEM) provides enough parallelism to be successfully accelerated by contemporary GPUs. However, their efficient implementation is not straightforward and requires time-consuming exploration of numerous implementation variants. In this paper, we present kernel fusion as an optimization technique and its application for element subroutines. Moreover,...

chapter

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Justin W. Richardson, Alan D. George, Herman Lam

2012 Symposium on Application Accelerators in High Performance Computing > 137 - 140

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance...

chapter

A Multi-Node GPGPU Implementation of Non-Linear Anisotropic Diffusion Filter

Vivek K. Pallipuram, Nimisha Raut, Xiaoyu Ren, Melissa C. Smith, more

2012 Symposium on Application Accelerators in High Performance Computing > 11 - 18

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

The quality of an image is highly critical for applications such as robotic vision, surveillance, medical imaging, etc. The images captured in real-time are seldom noise free and therefore require noise removal for further processing. Out of several proposed noise removal schemes, an isotropic diffusion filtering is known to achieve highly precise results. However, the accuracy comes at an expense...

chapter

On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering

Florian Wende, Frank Cordes, Thomas Steinke

2012 Symposium on Application Accelerators in High Performance Computing > 74 - 83

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

General-purpose graphics processing units (GPUs) have been found to be viable solutions for large-scale numerical computations with an inherent potential for massive parallelism. In contrast, only few is known about using GPUs for small-scale computations. To have the GPU not be under-utilized for small problem sizes, a meaningful approach is to perform as many small-scale computations as possible...

chapter

Power Aware Computing on GPUs

Kiran Kasichayanula, Dan Terpstra, Piotr Luszczek, Stan Tomov, more

2012 Symposium on Application Accelerators in High Performance Computing > 64 - 73

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

Energy and power density concerns in modern processors have led to significant computer architecture research efforts in power-aware and temperature-aware computing. With power dissipation becoming an increasingly vexing problem, power analysis of Graphical Processing Unit (GPU) and its components has become crucial for hardware and software system design. Here, we describe our technique for a coordinated...

chapter

GPU-based Real-time Decoding Technique for High-definition Videos

Huifang Deng, Chunhui Deng, Jingjing Li

2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing > 186 - 190

2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP)

In this paper, we first discussed the video decoding standard and its architecture, and then analyzed the decoding complexity of each process. By using the benefit of the CUDA programming model, and taking advantages of GPU to optimize the decoding process of MC (motion compensation) and CSC(color space conversion) that are very time consuming, we proposed a MC accelerating method based on CUDA, and...

chapter

Robust Real-Time Multiprocessor Interrupt Handling Motivated by GPUs

Glenn A. Elliott, James H. Anderson

2012 24th Euromicro Conference on Real-Time Systems > 267 - 276

2012 24th Euromicro Conference on Real-Time Systems (ECRTS)

Architectures in which multicore chips are augmented with graphics processing units (GPUs) have great potential in many domains in which computationally intensive real-time workloads must be supported. However, unlike standard CPUs, GPUs are treated as I/O devices and require the use of interrupts to facilitate communication with CPUs. Given their disruptive nature, interrupts must be dealt with carefully...

chapter

Makespan Computation for GPU Threads Running on a Single Streaming Multiprocessor

Kostiantyn Berezovskyi, Konstantinos Bletsas, Bjorn Andersson

2012 24th Euromicro Conference on Real-Time Systems > 277 - 286

2012 24th Euromicro Conference on Real-Time Systems (ECRTS)

Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed...

chapter

Supporting Preemptive Task Executions and Memory Copies in GPGPUs

Can Basaran, Kyoung-Don Kang

2012 24th Euromicro Conference on Real-Time Systems > 287 - 296

2012 24th Euromicro Conference on Real-Time Systems (ECRTS)

GPGPUs (General Purpose Graphic Processing Units) provide massive computational power. However, applying GPGPU technology to real-time computing is challenging due to the non-preemptive nature of GPGPUs. Especially, a job running in a GPGPU or a data copy between a GPGPU and CPU is non-preemptive. As a result, a high priority job arriving in the middle of a low priority job execution or memory copy...

chapter

Algorithmic strategies for optimizing the parallel reduction primitive in CUDA

Pedro J. Martin, Luis F. Ayuso, Roberto Torres, Antonio Gavilanes

2012 International Conference on High Performance Computing & Simulation (HPCS) > 511 - 519

2012 International Conference on High Performance Computing & Simulation (HPCS)

Many general-purpose applications exploit Graphics Processing Units (GPUs) by executing a set of well-known dataparallel primitives. Those primitives are usually invoked from the host many times, so their throughput has a great impact on the performance of the overall system. Thus, the study of novel algorithmic strategies to optimize their implementation on current devices is an interesting topic...

chapter

Acceleration of variance of color differences-based demosaicing using CUDA

Muhammad Ismail Faruqi, Fumihiko Ino, Kenichi Hagihara

2012 International Conference on High Performance Computing & Simulation (HPCS) > 503 - 510

2012 International Conference on High Performance Computing & Simulation (HPCS)

Image demosaicing algorithms are used to reconstruct a full color image from the incomplete color samples output (RAW data) of an image sensor overlaid with a Color Filter Array (CFA). Better demosaicing algorithms are superior in terms of acuity, dynamic range, signal to noise ratio, and artifact suppression, which make them suitable for high quality delivery such as theatrical broadcast. In this...

chapter

Accurate CUDA performance modeling for sparse matrix-vector multiplication

Ping Guo, Liqiang Wang

2012 International Conference on High Performance Computing & Simulation (HPCS) > 496 - 502

2012 International Conference on High Performance Computing & Simulation (HPCS)

This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our model match the measured...

chapter

How to correctly deal with pseudorandom numbers in manycore environments: Application to GPU programming with Shoverand

Jonathan Passerat-Palmbach, David R. C. Hill

2012 International Conference on High Performance Computing & Simulation (HPCS) > 25 - 31

2012 International Conference on High Performance Computing & Simulation (HPCS)

Stochastic simulations are often sensitive to the source of randomness that characterizes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computation time by relying more and more General Purpose Graphics Processing Units (GPGPUs) to speedup stochastic simulations. Such...

chapter

Efficient Implementation of Evaluating Multivariate Quadratic System with GPUs

Satoshi Tanaka, Takashi Nishide, Kouichi Sakurai

2012 Sixth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing > 660 - 664

2012 Sixth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS)

QUAD stream cipher uses multivariate polynomial systems. It has provable security based on the computational hardness assumption. More specifically, the security of QUAD depends on hardness of solving non-linear multivariate system us over a finite field, and it is known as an NP-Hard problem. However, QUAD is slower than other stream ciphers, and an efficient implementation, which has a reduced computational...

chapter

kNN-MST-Agglomerative: A fast and scalable graph-based data clustering approach on GPU

Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta, Pablo Moscato

2012 7th International Conference on Computer Science & Education (ICCSE) > 585 - 590

2012 7th International Conference on Computer Science & Education (ICCSE 2012)

Data clustering is a distinctive method for analyzing complex networks in terms of functional relationships of the comprising elements. A number of graph-based algorithms have been proposed so far to tackle the complexity of the problem and many of them are based on the representation of data in the form of a minimum spanning tree (MST). In this work, we propose a graph-based agglomerative clustering...

chapter

A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs

Rick Weber, Gregory D. Peterson

2012 Symposium on Application Accelerators in High Performance Computing > 19 - 25

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

Using GPUs as computational accelerators has been a growing area of research in the past several years. One particular area amenable to exploiting video card hardware is dense linear algebra. We continue this trend by generalizing the MAGMA xGEMM kernels, porting them to OpenCL and tuning them to run on the AMD 7970. Achieving up to 1.7 TFlops in SGEMM and 650 GFlops in DGEMM, we extend this performance...

Data set:
ieee
Keywords:
KERNEL
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Content availability

Available (593)
None (1)

Publication type

book (547)
article (47)

Keywords

INSTRUCTION SETS (306)
GPU (191)
COPROCESSORS (164)
CUDA (145)
COMPUTER GRAPHIC EQUIPMENT (139)
COMPUTATIONAL MODELING (112)
COMPUTER ARCHITECTURE (106)
PARALLEL PROCESSING (106)
GPGPU (73)
OPTIMIZATION (72)
HARDWARE (64)
ARRAYS (62)
PROGRAMMING (55)
MEMORY MANAGEMENT (49)
ACCELERATION (48)
PERFORMANCE EVALUATION (47)
GRAPHICS PROCESSING UNITS (46)
MATHEMATICAL MODEL (42)
ALGORITHM DESIGN AND ANALYSIS (39)
VECTORS (37)
OPENCL (36)
PARALLEL ARCHITECTURES (35)
COMPUTE UNIFIED DEVICE ARCHITECTURE (34)
LIBRARIES (34)
SYNCHRONIZATION (34)
REGISTERS (33)
SPARSE MATRICES (33)
CENTRAL PROCESSING UNIT (31)
COMPUTER GRAPHICS (31)
PIXEL (31)
INDEXES (30)
PARALLEL ALGORITHMS (28)
MULTIPROCESSING SYSTEMS (27)
PARALLEL PROGRAMMING (27)
BANDWIDTH (26)
PARALLEL COMPUTING (26)
EQUATIONS (25)
BENCHMARK TESTING (24)
CONVOLUTION (21)
HIGH PERFORMANCE COMPUTING (21)
MULTICORE PROCESSING (21)
REAL TIME SYSTEMS (21)
GRAPHICS (19)
OPTIMISATION (19)
RUNTIME (19)
THREE DIMENSIONAL DISPLAYS (19)
THROUGHPUT (19)
FIELD PROGRAMMABLE GATE ARRAYS (18)
YARN (18)
IMAGE PROCESSING (17)
RANDOM ACCESS MEMORY (16)
CPU (15)
OPENMP (15)
FEATURE EXTRACTION (14)
GENETIC ALGORITHMS (14)
GPU COMPUTING (14)
GRAPHIC PROCESSING UNIT (14)
TILES (14)
ACCURACY (13)
DATABASES (13)
ENCODING (13)
IMAGE COLOR ANALYSIS (13)
IMAGE RECONSTRUCTION (13)
INTERPOLATION (13)
MATRIX MULTIPLICATION (13)
PIPELINES (13)
SERVERS (13)
LAYOUT (12)
MEDICAL IMAGE PROCESSING (12)
MESSAGE SYSTEMS (12)
MPI (12)
CLUSTERING ALGORITHMS (11)
CONTEXT (11)
DATA STRUCTURES (11)
EDUCATIONAL INSTITUTIONS (11)
ITERATIVE METHODS (11)
JACOBIAN MATRICES (11)
SORTING (11)
TRAINING (11)
ULTRASONIC IMAGING (11)
BIOINFORMATICS (10)
DECODING (10)
IMAGE SEGMENTATION (10)
LATTICES (10)
LINEAR ALGEBRA (10)
NVIDIA (10)
PERFORMANCE (10)
PROTEINS (10)
APPLICATION PROGRAM INTERFACES (9)
CLOCKS (9)
ENERGY CONSUMPTION (9)
ENERGY EFFICIENCY (9)
EVOLUTIONARY COMPUTATION (9)
GRAPHICS PROCESSING UNIT (GPU) (9)
MULTI-THREADING (9)
POLYNOMIALS (9)
SCHEDULES (9)
BIOLOGY COMPUTING (8)
more

INFONA - science communication portal

Search results

Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU

Parallel one- and two-dimensional FFTs on GPGPUs

Preemption of a CUDA Kernel Function

The Fat-Link Computation on Large GPU Clusters for Lattice QCD

Automatically Optimized GPU Acceleration of Element Subroutines in Finite Element Method

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

A Multi-Node GPGPU Implementation of Non-Linear Anisotropic Diffusion Filter

On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering

Power Aware Computing on GPUs

GPU-based Real-time Decoding Technique for High-definition Videos

Robust Real-Time Multiprocessor Interrupt Handling Motivated by GPUs

Makespan Computation for GPU Threads Running on a Single Streaming Multiprocessor

Supporting Preemptive Task Executions and Memory Copies in GPGPUs

Algorithmic strategies for optimizing the parallel reduction primitive in CUDA

Acceleration of variance of color differences-based demosaicing using CUDA

Accurate CUDA performance modeling for sparse matrix-vector multiplication

How to correctly deal with pseudorandom numbers in manycore environments: Application to GPU programming with Shoverand

Efficient Implementation of Evaluating Multivariate Quadratic System with GPUs

kNN-MST-Agglomerative: A fast and scalable graph-based data clustering approach on GPU

A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options