Search results

Items from 81 to 100 out of 594 results

chapter

Fast Sparse Matrix-Vector Multiplication on Graphics Processing Unit for Finite Element Analysis

Abal-Kassim Cheik Ahamed, Frederic Magoules

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 1307 - 1314

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Finite element analysis involves the solution of linear systems described by large size sparse matrices. Iterative Krylov methods are well suited for such type of problems. These methods require linear algebra operations, including sparse matrix-vector multiplication which can be computationally expensive for large size matrices. In this paper, we present the best way to perform these operations,...

chapter

Directive-based Programming for GPUs: A Comparative Study

Ruym'n Reyes, Ivan Lopez, Juan J. Fumero, Francisco de Sande

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 410 - 417

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...

chapter

Rootbeer: Seamlessly Using GPUs from Java

Philip C. Pratt-Szeliga, James W. Fawcett, Roy D. Welch

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 375 - 380

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

When converting a serial program to a parallel program that can run on a Graphics Processing Unit (GPU) the developer must choose what functions will run on the GPU. For each function the developer chooses, he or she needs to manually write code to: 1) serialize state to GPU memory, 2) define the kernel code that the GPU will execute, 3) control the kernel launch and 4) deserialize state back to CPU...

chapter

A novel GPU implementation of eigenanalysis for risk management

Mustafa U. Torun, Ali N. Akansu

2012 IEEE 13th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) > 490 - 494

2012 IEEE 13th Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2012)

Portfolio risk is commonly defined as the standard deviation of its return. The empirical correlation matrix of asset returns in a portfolio has its intrinsic noise component. This noise is filtered for more robust performance. Eigendecomposition is a widely used method for noise filtering. Jacobi algorithm has been a popular eigensolver technique due to its stability. We present an efficient GPU...

chapter

Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators Based on a Domain-Specific Language for Medical Imaging

Richard Membarth, Frank Hannig, Jurgen Teich, Mario Korner, more

2012 11th International Symposium on Parallel and Distributed Computing > 211 - 218

2012 11th International Symposium on Parallel and Distributed Computing (ISPDC)

An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited and performance is predetermined by memory bandwidth...

chapter

Teaching Parallel Programming Models on a Shallow-Water Code

Alexander Breuer, Michael Bader

2012 11th International Symposium on Parallel and Distributed Computing > 301 - 308

2012 11th International Symposium on Parallel and Distributed Computing (ISPDC)

We present a software package that supports teaching different parallel programming models in a computational science and engineering context. It implements a Finite Volume solver for the shallow water equations, with application to tsunami simulation in mind. The numerical model is kept simple, using patches of Cartesian grids as computational domain, which can be connected via ghost layers. The...

chapter

Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA

Yu-Rong Chen, Che Lun Hung, Yu-Shiang Lin, Chun-Yuan Lin, more

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 849 - 854

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

The construction of phylogenetic trees is important for the computational biology, especially for the development of biological taxonomies. UPGMA is one of the most popular heuristic algorithms for constructing ultrametric trees (UT). Although the UT constructed by the UPGMA often is not a true tree unless the molecular clock assumption holds, the UT is still useful for the clocklike data. However,...

chapter

An Adaptative Multi-GPU Based Branch-and-Bound. A Case Study: The Flow-Shop Scheduling Problem

I. Chakroun, N. Melab

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 389 - 395

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Solving exactly Combinatorial Optimization Problems (COPs) using a Branch-and-Bound (B&B) algorithm requires a huge amount of computational resources. Therefore, we recently investigated designing B&B algorithms on top of graphics processing units (GPUs) using a parallel bounding model. The proposed model assumes parallelizing the evaluation of the lower bounds on pools of sub-problems...

chapter

Fast Linear Algebra on GPU

Lukas Polok, Pavel Smrz

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 439 - 444

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is a minimal size of primitives being handled in order to achieve significant speedups compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching operations to have sufficient amount of data...

chapter

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

Walid Abu-Sufah, Asma Abdel Karim

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 453 - 460

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed Jagged Diagonal storage format which is tailored for GPUs, BTJAD. We develop a highly optimized SpMV kernel that takes advantage of the properties...

chapter

SURF cascade face detection acceleration on Sandy Bridge processor

Eric Li, Liu Yang, Bin Wang, Jianguo Li, more

2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops > 41 - 47

2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops)

Along with the inclusion of GPU cores within the same CPU die, the performance of Intel's processor-graphics has been significantly improved over earlier generation of integrated graphics. This paper presents a highly optimized SURF cascade based face detector which efficiently exploits both CPU and GPU computing power on the latest Sandy Bridge processor. The SURF cascade classifier procedure is...

chapter

Vehicle detection and tracking using Mean Shift segmentation on semi-dense disparity maps

Sebastien Lefebvre, Sebastien Ambellouis

2012 IEEE Intelligent Vehicles Symposium > 855 - 860

2012 IEEE Intelligent Vehicles Symposium (IV)

This paper describes an original joint obstacle detection and tracking method based on a Mean Shift algorithm and semi-dense disparity maps. The semi-dense disparity maps are computed with a local 1D fuzzy scanline stereo matching approach. Each map is associated to a confidence map that is used to remove bad matches. The Mean Shift algorithm is applied to simultaneously extract each vehicle and track...

chapter

GPU-based Cloud computing for comparing the structure of protein binding sites

Matthias Leinweber, Lars Baumgartner, Marco Mernberger, Thomas Fober, more

2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST) > 1 - 6

2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST 2012) - Complex Environment Engineering

In this paper, we present a novel approach for using a GPU-based Cloud computing infrastructure to efficiently perform a structural comparison of protein binding sites. The original CPU-based Java version of a recent graph-based algorithm called SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been...

chapter

Particle Swarm Optimization on a GPU

Mikhail Rabinovich, Phillip Kainga, David Johnson, Brandon Shafer, more

2012 IEEE International Conference on Electro/Information Technology > 1 - 6

2012 IEEE International Conference on Electro/Information Technology (EIT 2012)

Optimization problems that contain discontinuities, non-linearity, or high dimensionality are difficult to solve and time consuming using conventional computational methods. This paper introduces a tool that solves these kinds of optimization problems using a patent pending Gaming Particle Swarm Optimization (GPSO) algorithm implemented on Graphics Processing Unit (GPU) hardware. Our study applied...

chapter

Multi-biomarker panel selection on a GPU

David Johnson, Brandon Shafer, Jaehwan John Lee, Jake Y. Chen

2012 IEEE International Conference on Electro/Information Technology > 1 - 6

2012 IEEE International Conference on Electro/Information Technology (EIT 2012)

Liquid chromatography-based tandem mass spectrometry (LC-MS) technique allows for identification and quantification of thousands of proteins in parallel. This technique coupled with a feed-forward artificial neural network provides a technique to analyze and select protein panels for use in multi-biomarker panel discovery applications. In this study, we enhance this technique by utilizing massively...

chapter

An algorithm to solve the Dominating Set Problem on GPUs

Christian Trefftz

2012 IEEE International Conference on Electro/Information Technology > 1 - 4

2012 IEEE International Conference on Electro/Information Technology (EIT 2012)

A brute-force algorithm to solve small instances of the Dominating Set Problem on GPUs is presented. Two implementations of the algorithm are discussed, one that uses atomic operations and one that uses reductions. Experimental results are reported.

chapter

Parallel algorithm of amplitude correction for time-lapse seismic data based on GPU

Zheng Wenjing, Liu Qicheng, Song Yibin, Tong Xiangrong, more

2012 International Conference on Systems and Informatics (ICSAI2012) > 924 - 926

2012 International Conference on Systems and Informatics (ICSAI)

Cross equalization is the core step of time-lapse seismic data processing, it can effectively eliminate the influence which is due to the inconsistent of acquisition, data processing and tube processing parameter. As the amount of data and processing of time-lapse seismic data increasing, it becomes the inevitable trend for seismic data to array on massively parallel processes. It deal with the time-lapse...

chapter

GPU accelerated simulation of the human arterial circulation

Lucian Itu, Sharma Puneet, Ali Kamen, Constantin Suciu, more

2012 13th International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) > 1478 - 1485

2012 13th International Conference on Optimization of Electrical and Electronic Equipment

A GPU accelerated implementation of a reduced-order model of the human arterial circulation is introduced. The computationally intensive tasks of the algorithm (namely, the computation of the flow rate and area values at the interior grid points of the domain) have been migrated to the GPU. The CPU not only coordinates the actions performed by the GPU, but it also computes the inflow, bifurcation...

chapter

Gigapixel spotlight synthetic aperture radar backprojection using clusters of GPUs and CUDA

Thomas M. Benson, Daniel P. Campbell, Daniel A. Cook

2012 IEEE Radar Conference > 853 - 858

2012 IEEE Radar Conference (RadarCon)

Synthetic aperture radar (SAR) image formation via backprojection offers a robust mechanism by which to form images on general, non-planar surfaces, without often restrictive assumptions regarding the planarity of the wavefront at the locations being imaged. However, backprojection presents a substantially increased computational load relative to other image formation algorithms that typically depend...

chapter

A GPU implementation of color digital halftoning using the Direct Binary Search algorithm

Kartheek Chandu, Mikel Stanich, Barry Trager, Chai Wah Wu

2012 IEEE International Symposium on Circuits and Systems > 185 - 188

2012 IEEE International Symposium on Circuits and Systems - ISCAS 2012

We illustrate how employing Graphics Processing Units (GPU) can speed-up intensive image processing operations. In particular, we demonstrate the use of the NVIDIA CUDA architecture to implement a color digital binary halftoning algorithm based on Direct Binary Search (DBS). Halftoning a color image is more computationally expensive than the single color case as there is a need to minimize dot interaction...

Data set:
ieee
Keywords:
KERNEL
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Content availability

Available (593)
None (1)

Publication type

book (547)
article (47)

Keywords

INSTRUCTION SETS (306)
GPU (191)
COPROCESSORS (164)
CUDA (145)
COMPUTER GRAPHIC EQUIPMENT (139)
COMPUTATIONAL MODELING (112)
COMPUTER ARCHITECTURE (106)
PARALLEL PROCESSING (106)
GPGPU (73)
OPTIMIZATION (72)
HARDWARE (64)
ARRAYS (62)
PROGRAMMING (55)
MEMORY MANAGEMENT (49)
ACCELERATION (48)
PERFORMANCE EVALUATION (47)
GRAPHICS PROCESSING UNITS (46)
MATHEMATICAL MODEL (42)
ALGORITHM DESIGN AND ANALYSIS (39)
VECTORS (37)
OPENCL (36)
PARALLEL ARCHITECTURES (35)
COMPUTE UNIFIED DEVICE ARCHITECTURE (34)
LIBRARIES (34)
SYNCHRONIZATION (34)
REGISTERS (33)
SPARSE MATRICES (33)
CENTRAL PROCESSING UNIT (31)
COMPUTER GRAPHICS (31)
PIXEL (31)
INDEXES (30)
PARALLEL ALGORITHMS (28)
MULTIPROCESSING SYSTEMS (27)
PARALLEL PROGRAMMING (27)
BANDWIDTH (26)
PARALLEL COMPUTING (26)
EQUATIONS (25)
BENCHMARK TESTING (24)
CONVOLUTION (21)
HIGH PERFORMANCE COMPUTING (21)
MULTICORE PROCESSING (21)
REAL TIME SYSTEMS (21)
GRAPHICS (19)
OPTIMISATION (19)
RUNTIME (19)
THREE DIMENSIONAL DISPLAYS (19)
THROUGHPUT (19)
FIELD PROGRAMMABLE GATE ARRAYS (18)
YARN (18)
IMAGE PROCESSING (17)
RANDOM ACCESS MEMORY (16)
CPU (15)
OPENMP (15)
FEATURE EXTRACTION (14)
GENETIC ALGORITHMS (14)
GPU COMPUTING (14)
GRAPHIC PROCESSING UNIT (14)
TILES (14)
ACCURACY (13)
DATABASES (13)
ENCODING (13)
IMAGE COLOR ANALYSIS (13)
IMAGE RECONSTRUCTION (13)
INTERPOLATION (13)
MATRIX MULTIPLICATION (13)
PIPELINES (13)
SERVERS (13)
LAYOUT (12)
MEDICAL IMAGE PROCESSING (12)
MESSAGE SYSTEMS (12)
MPI (12)
CLUSTERING ALGORITHMS (11)
CONTEXT (11)
DATA STRUCTURES (11)
EDUCATIONAL INSTITUTIONS (11)
ITERATIVE METHODS (11)
JACOBIAN MATRICES (11)
SORTING (11)
TRAINING (11)
ULTRASONIC IMAGING (11)
BIOINFORMATICS (10)
DECODING (10)
IMAGE SEGMENTATION (10)
LATTICES (10)
LINEAR ALGEBRA (10)
NVIDIA (10)
PERFORMANCE (10)
PROTEINS (10)
APPLICATION PROGRAM INTERFACES (9)
CLOCKS (9)
ENERGY CONSUMPTION (9)
ENERGY EFFICIENCY (9)
EVOLUTIONARY COMPUTATION (9)
GRAPHICS PROCESSING UNIT (GPU) (9)
MULTI-THREADING (9)
POLYNOMIALS (9)
SCHEDULES (9)
BIOLOGY COMPUTING (8)
more

INFONA - science communication portal

Search results

Fast Sparse Matrix-Vector Multiplication on Graphics Processing Unit for Finite Element Analysis

Directive-based Programming for GPUs: A Comparative Study

Rootbeer: Seamlessly Using GPUs from Java

A novel GPU implementation of eigenanalysis for risk management

Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators Based on a Domain-Specific Language for Medical Imaging

Teaching Parallel Programming Models on a Shallow-Water Code

Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA

An Adaptative Multi-GPU Based Branch-and-Bound. A Case Study: The Flow-Shop Scheduling Problem

Fast Linear Algebra on GPU

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

SURF cascade face detection acceleration on Sandy Bridge processor

Vehicle detection and tracking using Mean Shift segmentation on semi-dense disparity maps

GPU-based Cloud computing for comparing the structure of protein binding sites

Particle Swarm Optimization on a GPU

Multi-biomarker panel selection on a GPU

An algorithm to solve the Dominating Set Problem on GPUs

Parallel algorithm of amplitude correction for time-lapse seismic data based on GPU

GPU accelerated simulation of the human arterial circulation

Gigapixel spotlight synthetic aperture radar backprojection using clusters of GPUs and CUDA

A GPU implementation of color digital halftoning using the Direct Binary Search algorithm

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options