Search results

Items from 1 to 20 out of 24 results

chapter

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms

Omer Erdil Albayrak, Ismail Akturk, Ozcan Ozturk

2012 41st International Conference on Parallel Processing Workshops > 81 - 88

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Many core accelerators are being deployed in many systems to improve the processing capabilities. In such systems, application mapping need to be enhanced to maximize the utilization of the underlying architecture. Especially in GPUs mapping becomes critical for multi-kernel applications as kernels may exhibit different characteristics. While some of the kernels run faster on GPU, others may refer...

chapter

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Justin W. Richardson, Alan D. George, Herman Lam

2012 Symposium on Application Accelerators in High Performance Computing > 137 - 140

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance...

chapter

Accurate CUDA performance modeling for sparse matrix-vector multiplication

Ping Guo, Liqiang Wang

2012 International Conference on High Performance Computing & Simulation (HPCS) > 496 - 502

2012 International Conference on High Performance Computing & Simulation (HPCS)

This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our model match the measured...

chapter

Fast Linear Algebra on GPU

Lukas Polok, Pavel Smrz

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 439 - 444

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is a minimal size of primitives being handled in order to achieve significant speedups compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching operations to have sufficient amount of data...

chapter

Energy Efficiency Analysis of GPUs

Juan M. Cebri'n, Gines D. Guerrero, Jose M. Garcia

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1014 - 1022

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the last few years, Graphics Processing Units (GPUs) have become a great tool for massively parallel computing. GPUs are specifically designed for throughput and face several design challenges, specially what is known as the Power and Memory Walls. In these devices, available resources should be used to enhance performance and throughput, as the performance per watt is really high. For massively...

chapter

phiGEMM: A CPU-GPU Library for Porting Quantum ESPRESSO on Hybrid Systems

Filippo Spiga, Ivan Girotto

2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing > 368 - 375

2012 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

GPU computing has revolutionized HPC by bringing the performance of the supercomputer to the desktop. Attractive price, performance, and power characteristics allow multiple GPUs to be plugged into both desktop machines as well as supercomputer nodes for increased performance. Excellent performance and scalability can be achieved for some problems using hybrid combinations of multiple GPUs and CPU...

chapter

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

Rohit Sinha, Aayush Prakash, Hiren D. Patel

17th Asia and South Pacific Design Automation Conference > 455 - 460

2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC)

This work presents a methodology that parallelizes the simulation of mixed-abstraction level SystemC models across multicore CPUs, and graphics processing units (GPUs) for improved simulation performance. Given a SystemC model, we partition it into processes suitable for GPU execution and CPU execution. We convert the processes identified for GPU execution into GPU kernels with additional SystemC...

chapter

Accelerating multi-scale flows for LDDKBM diffeomorphic registration

Stefan Sommer

2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) > 499 - 505

2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)

Registrations in medical imaging and computational anatomy can be obtained using the Large Deformation Diffeomorphic Kernel Bundle Mapping (LDDKBM) framework. This provides a registration algorithm with a solid mathematical foundation while incorporating regularization of deformation at multiple scales. Because the variational formulation of LDDKBM implies a heavy computational burden in the search...

chapter

Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications

Ying Zhang, Lu Peng, Bin Li, Jih-Kwon Peir, more

2011 IEEE International Symposium on Workload Characterization (IISWC) > 205 - 215

2011 IEEE International Symposium on Workload Characterization (IISWC)

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores and the memory subsystem. In...

chapter

Improving GPU Robustness by making use of faulty parts

Artem Durytskyy, Mohamed Zahran, Ramesh Karri

2011 IEEE 29th International Conference on Computer Design (ICCD) > 346 - 351

2011 IEEE 29th International Conference on Computer Design (ICCD 2011)

With hundreds of processing units in current state-of-the-art graphics processing units (GPUs), the probability that one or more processing units fail due to permanent faults, during fabrication or post deployment, increases drastically. In our experiments we found that the loss of a single streaming multiprocessor (SM) in an 8-SM GPU resulted in as much as 16%performance loss. The default method...

chapter

Using GPUs to accelerate FPGA wirelength estimate for use with complex search operators

Christian Fobel, Gary Grewal, Deborah Stacey

2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE) > 1129 - 1134

2011 24th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)

As the precise wirelength for a given placement can only be known after routing, accurate and fast to compute wirelength estimates are required for FPGA placement algorithms. Two of the more effective wirelength estimation models are HPWL [1] and Star+ [2]. However, both of these models are expensive to compute requiring O(nm) time, where n is the number of nets and m is the average number of blocks...

chapter

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications

Hiroyuki Takizawa, Kentaro Koyama, Katsuto Sato, Kazuhiko Komatsu, more

2011 IEEE International Parallel & Distributed Processing Symposium > 864 - 876

2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

In this paper, we propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional check pointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another...

chapter

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

C Gregg, K Hazelwood

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 134 - 144

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000× over code running on multi-core CPUs. Other studies have claimed that well-tuned CPU code...

chapter

Optimizing simulated annealing on GPU: A case study with IC floorplanning

Yiding Han, Sanghamitra Roy, Koushik Chakraborty

2011 12th International Symposium on Quality Electronic Design > 1 - 7

2011 12th International Symposium on Quality Electronic Design (ISQED 2011)

In this paper, we propose a novel floorplanning algorithm based on simulated annealing on GPUs. Simulated annealing is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Data (SIMD) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floorplan solution space, where we evaluate concurrent moves on a given...

chapter

Programming GPU Clusters with Shared Memory Abstraction in Software

Konstantinos I Karantasis, Eleftherios D Polychronopoulos

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 223 - 230

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

As many-core graphics processors gain an increasingly important position concerning the advancements on modern highly concurrent processors, we are experiencing the deployment of the first heterogeneous clusters that are based on GPUs. The attempts to match future expectations in computational power and energy saving with hybrid - GPU-based - clusters are expected to grow in the next years, and much...

chapter

Parallel cross-layer optimization of high-level synthesis and physical design

J Williamson, Yinghai Lu, Li Shang, Hai Zhou, more

16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011) > 467 - 472

2011 16th Asia and South Pacific Design Automation Conference, ASP-DAC 2011

Integrated circuit (IC) design automation has traditionally followed a hierarchical approach. Modern IC design flow is divided into sequentially-addressed design and optimization layers; each successively finer in design detail and data granularity while increasing in computational complexity. Eventual agreement across the design layers signals design closure. Obtaining design closure is a continual...

chapter

Acceleration of Functional Validation Using GPGPU

L Suresh, N Rameshan, M S Gaur, M Zwolinski, more

2011 Sixth IEEE International Symposium on Electronic Design, Test and Application > 211 - 216

2011 IEEE 6th International Workshop on Electronic Design, Test and Application (DELTA 2011)

Logic simulation of a VLSI chip is a computationally intensive process. There exists an urgent need to map functional validation algorithms onto parallel architectures to aid hardware designers in meeting time-to-market constraints. In this paper, we propose three novel methods for logic simulation of combinational circuits on GPGPUs. Initial experiments run on two methods using benchmark circuits...

chapter

Evaluating the potential of graphics processors for high performance embedded computing

Shuai Mu, Chenxi Wang, Ming Liu, Dongdong Li, more

2011 Design, Automation&Test in Europe > 1 - 6

2011 Design, Automation & Test in Europe

Today's high performance embedded computing applications are posing significant challenges for processing throughout. Traditionally, such applications have been realized on application specific integrated circuits (ASICs) and/or digital signal processors (DSP). However, ASICs' advantage in performance and power often could not justify the fast increasing fabrication cost, while current DSP offers...

chapter

CuMAPz: A tool to analyze memory access patterns in CUDA

Yooseong Kim, Aviral Shrivastava

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC) > 128 - 133

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC)

CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider several architectural details, and small changes in source code, especially on memory access pattern, affect performance significantly. This paper presents CuMAPz, a tool to compare the memory performance of a CUDA program...

chapter

Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing

B Betkaoui, D B Thomas, W Luk

2010 International Conference on Field-Programmable Technology > 94 - 101

2010 International Conference on Field-Programmable Technology (FPT 2010)

This paper provides the first comparison of performance and energy efficiency of high productivity computing systems based on FPGA (Field-Programmable Gate Array) and GPU (Graphics Processing Unit) technologies. The search for higher performance compute solutions has recently led to great interest in heterogeneous systems containing FPGA and GPU accelerators. While these accelerators can provide significant...

Data set:
ieee
Keywords:
KERNEL
BENCHMARK TESTING
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Keywords

INSTRUCTION SETS (11)
COPROCESSORS (10)
GPU (10)
COMPUTER GRAPHIC EQUIPMENT (9)
COMPUTATIONAL MODELING (5)
CUDA (5)
HARDWARE (4)
PERFORMANCE EVALUATION (4)
BANDWIDTH (3)
FIELD PROGRAMMABLE GATE ARRAYS (3)
GPGPU (3)
LIBRARIES (3)
MULTIPROCESSING SYSTEMS (3)
PROGRAMMING (3)
SYNCHRONIZATION (3)
ALGORITHM DESIGN AND ANALYSIS (2)
COMPUTER ARCHITECTURE (2)
ELECTRONIC DESIGN AUTOMATION (2)
ENERGY EFFICIENCY (2)
GFLOPS (2)
INTEGRATED CIRCUIT LAYOUT (2)
MEMORY MANAGEMENT (2)
MICROPROCESSOR CHIPS (2)
MULTICORE PROCESSING (2)
OPENCL (2)
OPTIMIZATION (2)
PARALLEL PROCESSING (2)
VECTORS (2)
3D-FFT (1)
4-NODE MULTIGPU CLUSTER (1)
ACCURACY (1)
ADDRESS SEQUENCES (1)
ALU-FETCH OPERATION RATIO (1)
AMD (1)
AMD GPU (1)
AMD PIXEL SHADER (1)
AMD STREAMSDK (1)
ANALYTICAL MODEL (1)
ANALYTICAL MODELS (1)
ANNEALING (1)
APPLICATION PORTING (1)
APPLICATION SPECIFIC INTEGRATED CIRCUITS (1)
ARCHITECTURAL FEATURES (1)
ARRAYS (1)
ATI (1)
AUTOMATIC TUNING (1)
BASIC PROGRAM CHARACTERISTICS (1)
BENCHMARK (1)
BENCHMARK CIRCUIT (1)
BLAS (1)
BURST WRITE LATENCY (1)
CHANNEL ESTIMATION (1)
CHECKPOINTING (1)
CIRCUIT COMPLEXITY (1)
CIRCUIT LAYOUT CAD (1)
CIRCUIT OPTIMISATION (1)
CLOCKS (1)
CLONING (1)
CLUSTER MIDDLEWARE (1)
CLUSTERING (1)
CODE ACCELERATORS (1)
CODE OPTIMIZATION (1)
COMBINATIONAL CIRCUIT (1)
COMBINATIONAL CIRCUITS (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTATIONAL FLUID DYNAMICS (1)
COMPUTE SHADER MODES (1)
COMPUTER GRAPHICS (1)
CONTEMPORARY CMP WORKLOADS (1)
CONTEXT (1)
CONVERGENCE (1)
CORRELATION (1)
CPU INSTRUCTION SET ARCHITECTURE (1)
CUDA CLUSTER (1)
CUDA SYSTEM (1)
DATA STRUCTURES (1)
DATA TRANSFER (1)
DATABASES (1)
DENSITY FUNCTIONAL THEORY (1)
DESKTOP PROCESSING POWER (1)
DIGITAL SIGNAL PROCESSING (1)
DIGITAL SIGNAL PROCESSORS (1)
DOMAIN SIZE (1)
DSP (1)
EDA (1)
EDA DESIGN FLOW PROCESS (1)
EMBEDDED SYSTEMS (1)
ENERGY SAVING (1)
ENGINES (1)
ESTIMATION (1)
FAST COMPUTING CORES (1)
FERMI (1)
FIELD-PROGRAMMABLE GATE ARRAY (1)
FLOORPLAN SOLUTION SPACE (1)
FPGA (1)
FPGA PROGRAMMING (1)
FPGA-BASED HYBRID-CORE SYSTEM (1)
more

INFONA - science communication portal

Search results

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Accurate CUDA performance modeling for sparse matrix-vector multiplication

Fast Linear Algebra on GPU

Energy Efficiency Analysis of GPUs

phiGEMM: A CPU-GPU Library for Porting Quantum ESPRESSO on Hybrid Systems

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

Accelerating multi-scale flows for LDDKBM diffeomorphic registration

Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications

Improving GPU Robustness by making use of faulty parts

Using GPUs to accelerate FPGA wirelength estimate for use with complex search operators

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Optimizing simulated annealing on GPU: A case study with IC floorplanning

Programming GPU Clusters with Shared Memory Abstraction in Software

Parallel cross-layer optimization of high-level synthesis and physical design

Acceleration of Functional Validation Using GPGPU

Evaluating the potential of graphics processors for high performance embedded computing

CuMAPz: A tool to analyze memory access patterns in CUDA

Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options