Search results

Items from 41 to 60 out of 262 results

chapter

Accelerating Parameter Sweep Applications Using CUDA

M Motokubota, F Ino, K Hagihara

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 111 - 118

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

This paper proposes a parallelization scheme for parameter sweep (PS) applications using the compute unified device architecture (CUDA). Our scheme focuses on PS applications with irregular access patterns, which usually result in lower performance on the GPU. The key idea to resolve this irregularity is to exploit the similarity of data accesses between different parameters. That is, the scheme simultaneously...

chapter

Optimize or Wait? Using llc Fast-Prototyping Tool to Evaluate CUDA Optimizations

R Reyes, F de Sande

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 257 - 261

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

Over the last few years, we have witnessed the proliferation of GPU devices on HPC environments. Manufacturers produce new versions of their devices every few years, though, posing a new problem for scientists and engineers using their technology: is it worth the time and effort spent optimizing the codes for the current version? Or it is better to wait until a new architecture appears? In this paper,...

chapter

Accelerating Particle Swarm Algorithm with GPGPU

Miguel Cádenas-Montes, Miguel A Vega-Rodríguez, Juan José Rodríguez-Vázquez, Antonio Gómez-Iglesias

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 560 - 564

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

This paper focuses on solving large size optimization problems using GPGPU. Evolutionary Algorithms for solving these optimization problems suffer from the curse of dimensionality, which implies that their performance deteriorates as quickly as the dimensionality of the search space increases. This difficulty makes very challenging the performance studies for very high dimensional problems. Furthermore,...

chapter

Programming GPU Clusters with Shared Memory Abstraction in Software

Konstantinos I Karantasis, Eleftherios D Polychronopoulos

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 223 - 230

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

As many-core graphics processors gain an increasingly important position concerning the advancements on modern highly concurrent processors, we are experiencing the deployment of the first heterogeneous clusters that are based on GPUs. The attempts to match future expectations in computational power and energy saving with hybrid - GPU-based - clusters are expected to grow in the next years, and much...

chapter

Parallel cross-layer optimization of high-level synthesis and physical design

J Williamson, Yinghai Lu, Li Shang, Hai Zhou, more

16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011) > 467 - 472

2011 16th Asia and South Pacific Design Automation Conference, ASP-DAC 2011

Integrated circuit (IC) design automation has traditionally followed a hierarchical approach. Modern IC design flow is divided into sequentially-addressed design and optimization layers; each successively finer in design detail and data granularity while increasing in computational complexity. Eventual agreement across the design layers signals design closure. Obtaining design closure is a continual...

chapter

Profile assisted online system-level performance and power estimation for dynamic reconfigurable embedded systems

Jingqing Mu, R Lysecky

16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011) > 737 - 742

2011 16th Asia and South Pacific Design Automation Conference, ASP-DAC 2011

Significant research has demonstrated the performance and power benefits of runtime dynamic reconfiguration of FPGAs and microprocessor/FPGA devices. For dynamically reconfigurable systems, in which the selection of hardware coprocessors to implement within the FPGA is determined at runtime, online estimation methods are needed to evaluate the performance and power consumption impact of the hardware...

chapter

Acceleration of Functional Validation Using GPGPU

L Suresh, N Rameshan, M S Gaur, M Zwolinski, more

2011 Sixth IEEE International Symposium on Electronic Design, Test and Application > 211 - 216

2011 IEEE 6th International Workshop on Electronic Design, Test and Application (DELTA 2011)

Logic simulation of a VLSI chip is a computationally intensive process. There exists an urgent need to map functional validation algorithms onto parallel architectures to aid hardware designers in meeting time-to-market constraints. In this paper, we propose three novel methods for logic simulation of combinational circuits on GPGPUs. Initial experiments run on two methods using benchmark circuits...

chapter

Coordinate strip-mining and kernel fusion to lower power consumption on GPU

Guibin Wang

2011 Design, Automation&Test in Europe > 1 - 4

2011 Design, Automation & Test in Europe

Although general purpose GPUs have relatively high computing capacity, they also introduce high power consumption compared with general purpose CPUs. Therefore low-power techniques targeted for GPUs will be one of the most hot topics in the future. On the other hand, in several application domains, users are unwilling to sacrifice performance to save power. In this paper, we propose an effective kernel...

chapter

A Scalable LDPC Decoder on GPU

K K Abburi

2011 24th Internatioal Conference on VLSI Design > 183 - 188

2011 24th International Conference on VLSI Design: concurrently with the 10th International Conference on Embedded Systems Design

A flexible and scalable approach for LDPC decoding on CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this...

chapter

GPU-based acceleration of MPIE/MoM matrix calculation for the analysis of microstrip circuits

Danilo De Donno, A Esposito, G Monti, L Tarricone

Proceedings of the 5th European Conference on Antennas and Propagation (EUCAP) > 3921 - 3924

2011 5th European Conference on Antennas and Propagation (EuCAP)

In this paper, we present a GPU-based algorithm which accelerates the MoM impedance matrix computation. Based on an efficient quasi-one-dimensional approximation of the reaction integrals, the MPIE formulation for the analysis of microstrip circuits is considered. We use NVIDIA CUDA as GPU development tool and choose an edge-connected line-fed patch antenna as reference problem. In order to demonstrate...

chapter

Gemma in April: A matrix-like parallel programming architecture on OpenCL

Tianji Wu, Di Wu, Yu Wang, Xiaorui Zhang, more

2011 Design, Automation&Test in Europe > 1 - 6

2011 Design, Automation & Test in Europe

Nowadays, Graphics Processing Unit (GPU), as a kind of massive parallel processor, has been widely used in general purposed computing tasks. Although there have been mature development tools, it is not a trivial task for programmers to write GPU programs. Based on this consideration, we propose a novel parallel computing architecture. The architecture includes a parallel programming model, named Gemma,...

chapter

Evaluating the potential of graphics processors for high performance embedded computing

Shuai Mu, Chenxi Wang, Ming Liu, Dongdong Li, more

2011 Design, Automation&Test in Europe > 1 - 6

2011 Design, Automation & Test in Europe

Today's high performance embedded computing applications are posing significant challenges for processing throughout. Traditionally, such applications have been realized on application specific integrated circuits (ASICs) and/or digital signal processors (DSP). However, ASICs' advantage in performance and power often could not justify the fast increasing fabrication cost, while current DSP offers...

chapter

Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing

B Betkaoui, D B Thomas, W Luk

2010 International Conference on Field-Programmable Technology > 94 - 101

2010 International Conference on Field-Programmable Technology (FPT 2010)

This paper provides the first comparison of performance and energy efficiency of high productivity computing systems based on FPGA (Field-Programmable Gate Array) and GPU (Graphics Processing Unit) technologies. The search for higher performance compute solutions has recently led to great interest in heterogeneous systems containing FPGA and GPU accelerators. While these accelerators can provide significant...

chapter

Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

Ping Guo, Liqiang Wang

2010 International Conference on Computational and Information Sciences > 1154 - 1157

2010 International Conference on Computational and Information Sciences (ICCIS 2010)

Graphics Processing Unit (GPU) has become an attractive coprocessor for scientific computing due to its massive processing capability. The sparse matrix-vector multiplication (SpMV) is a critical operation in a wide variety of scientific and engineering applications, such as sparse linear algebra and image processing. This paper presents an auto-tuning framework that can automatically compute and...

chapter

Support Vector Machines on GPU with Sparse Matrix Format

Tsung-Kai Lin, Shao-Yi Chien

2010 Ninth International Conference on Machine Learning and Applications > 313 - 318

2010 Ninth International Conference on Machine Learning and Applications (ICMLA 2010)

Emerging general-purpose Graphics Processing Unit (GPU) provides a multi-core platform for wide applications, including machine learning algorithms. In this paper, we proposed several techniques to accelerate Support Vector Machines (SVM) on GPUs. Sparse matrix format is introduced into parallel SVM to achieve better performance. Experimental results show that the speedup of 55x-133.8x over LIBSVM...

chapter

CUDA Based Fast Implementation of Very Large Matrix Computation

Yinghong Sun, Yuanman Tong

2010 International Conference on Parallel and Distributed Computing, Applications and Technologies > 487 - 491

2010 11th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2010)

CUDA (Compute Unified Device Architecture) acceleration of very large scale matrix-vector and matrix-matrix multiplication is presented in this paper. The intrinsic parallelism in the matrix computations are exploited thoroughly. By dividing the entire matrix computation to multiple sub-groups, scalable performance improvement can be achieved using multiple GPUs. The key operations are accelerated...

chapter

Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor

Jian Wang, J Sohl, D Liu

2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing > 47 - 52

2010 IEEE/IFIP 8th International Conference on Embedded and Ubiquitous Computing (EUC 2010)

The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient architecture for high performance signal processing which explores both task level parallelism by multi-core processing and data level parallelism by SIMD processors. Different from the cache-based memory subsystem in most general purpose processors, this architecture uses on-chip scratchpad memory (SPM)...

chapter

Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU

Guibin Wang, YiSong Lin, Wei Yi

2010 IEEE/ACM Int'l Conference on Green Computing and Communications&Int'l Conference on Cyber, Physical and Social Computing > 344 - 350

2010 IEEE/ACM Int'l Conference on Green Computing and Communications (GreenCom) and Int'l Conference on Cyber, Physical and Social Computing (CPSCom)

As one of the most popular accelerators, Graphics Processing Unit (GPU) has demonstrated high computing power in several application fields. On the other hand, GPU also produces high power consumption and has been one of the most largest power consumers in desktop and supercomputer systems. However, software power optimization method targeted for GPU has not been well studied. In this work, we propose...

chapter

Analysis of Parallel Algorithms for Energy Conservation with GPU

Zhuowei Wang, Xianbin Xu, Naixue Xiong, L T Yang, more

2010 IEEE/ACM Int'l Conference on Green Computing and Communications&Int'l Conference on Cyber, Physical and Social Computing > 155 - 162

2010 IEEE/ACM Int'l Conference on Green Computing and Communications (GreenCom) and Int'l Conference on Cyber, Physical and Social Computing (CPSCom)

GPU has recently gained considerable attention in getting significant performance, for application raging from scientific computing to database sorting and search. General-purpose computing on GPU can easily reduce the execution time but results in an associated increase in the energy consumption. This paper analyzes energy consumption of parallel algorithms executing on GPU and provide a methodology...

chapter

Accelerating global sequence alignment using CUDA compatible multi-core GPU

T R P Siriwardena, D N Ranasinghe

2010 Fifth International Conference on Information and Automation for Sustainability > 201 - 206

2010 5th International Conference on Information and Automation for Sustainability (ICIAfS)

The Graphical Processing Unit (GPU) has become a competitive general purpose computational hardware platform in the last few years. Recent improvements in GPUs highly parallel programming capabilities such as Compute Unified Device Architecture(CUDA) has lead to a variety of complex applications with tremendous performance improvements. Genetic Sequence alignment is considered to be one of the application...

Keywords:
KERNEL
COPROCESSORS

Publication date

Set your own date range

Content availability

Available (254)
None (8)

Keywords

GRAPHICS PROCESSING UNIT (158)
COMPUTER GRAPHIC EQUIPMENT (117)
INSTRUCTION SETS (92)
GPU (86)
CUDA (72)
COMPUTER ARCHITECTURE (59)
PARALLEL PROCESSING (50)
COMPUTATIONAL MODELING (47)
GRAPHICS PROCESSING UNITS (42)
GPGPU (40)
COMPUTER GRAPHICS (38)
HARDWARE (37)
YARN (37)
PROGRAMMING (36)
PARALLEL ARCHITECTURES (33)
OPTIMIZATION (30)
ARRAYS (27)
ACCELERATION (26)
PIXEL (24)
ALGORITHM DESIGN AND ANALYSIS (23)
COMPUTE UNIFIED DEVICE ARCHITECTURE (23)
MULTIPROCESSING SYSTEMS (23)
PERFORMANCE EVALUATION (23)
REGISTERS (22)
BENCHMARK TESTING (20)
FIELD PROGRAMMABLE GATE ARRAYS (20)
BANDWIDTH (18)
GRAPHICS (18)
HIGH PERFORMANCE COMPUTING (17)
MEMORY MANAGEMENT (17)
PARALLEL ALGORITHMS (17)
PARALLEL PROGRAMMING (17)
SPARSE MATRICES (17)
LIBRARIES (16)
OPTIMISATION (15)
PARALLEL COMPUTING (15)
GRAPHIC PROCESSING UNIT (14)
RUNTIME (14)
IMAGE PROCESSING (13)
MATHEMATICAL MODEL (13)
PROGRAM PROCESSORS (13)
RANDOM ACCESS MEMORY (13)
CPU (12)
DATA MINING (12)
MATRIX MULTIPLICATION (12)
CENTRAL PROCESSING UNIT (11)
INDEXES (11)
THREE DIMENSIONAL DISPLAYS (11)
NVIDIA (10)
EQUATIONS (9)
MAGNETIC CORES (9)
MULTICORE PROCESSING (9)
OPENCL (9)
POWER AWARE COMPUTING (9)
THROUGHPUT (9)
BIOINFORMATICS (8)
CONVOLUTION (8)
FAST FOURIER TRANSFORMS (8)
ITERATIVE METHODS (8)
MULTI-THREADING (8)
STREAMING MEDIA (8)
SYNCHRONIZATION (8)
APPLICATION PROGRAM INTERFACES (7)
BIOLOGY COMPUTING (7)
COPROCESSOR (7)
DATA TRANSFER (7)
EMBEDDED SYSTEMS (7)
FINITE DIFFERENCE METHODS (7)
FLOATING POINT ARITHMETIC (7)
GENERAL PURPOSE GRAPHICS PROCESSING UNITS (7)
GRAPHICAL PROCESSING UNIT (7)
HEURISTIC ALGORITHMS (7)
LAYOUT (7)
LINUX (7)
PATTERN CLUSTERING (7)
PROCESSOR SCHEDULING (7)
SERVERS (7)
SHARED MEMORY SYSTEMS (7)
VECTORS (7)
CLOCKS (6)
COMPUTATIONAL COMPLEXITY (6)
COMPUTERISED TOMOGRAPHY (6)
CRYPTOGRAPHY (6)
DATABASES (6)
DECODING (6)
ENERGY CONSUMPTION (6)
FEATURE EXTRACTION (6)
FPGA (6)
GENERAL PURPOSE COMPUTERS (6)
GRAPHICS HARDWARE (6)
HISTOGRAMS (6)
IMAGE RECONSTRUCTION (6)
LINEAR ALGEBRA (6)
MATHEMATICS COMPUTING (6)
MEDICAL IMAGE PROCESSING (6)
MESSAGE SYSTEMS (6)
NVIDIA CUDA (6)
OPENMP (6)
more

INFONA - science communication portal

Search results

Accelerating Parameter Sweep Applications Using CUDA

Optimize or Wait? Using llc Fast-Prototyping Tool to Evaluate CUDA Optimizations

Accelerating Particle Swarm Algorithm with GPGPU

Programming GPU Clusters with Shared Memory Abstraction in Software

Parallel cross-layer optimization of high-level synthesis and physical design

Profile assisted online system-level performance and power estimation for dynamic reconfigurable embedded systems

Acceleration of Functional Validation Using GPGPU

Coordinate strip-mining and kernel fusion to lower power consumption on GPU

A Scalable LDPC Decoder on GPU

GPU-based acceleration of MPIE/MoM matrix calculation for the analysis of microstrip circuits

Gemma in April: A matrix-like parallel programming architecture on OpenCL

Evaluating the potential of graphics processors for high performance embedded computing

Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing

Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

Support Vector Machines on GPU with Sparse Matrix Format

CUDA Based Fast Implementation of Very Large Matrix Computation

Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor

Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU

Analysis of Parallel Algorithms for Energy Conservation with GPU

Accelerating global sequence alignment using CUDA compatible multi-core GPU

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options