Search results

Items from 61 to 80 out of 594 results

chapter

Energy Analysis of Parallel Scientific Kernels on Multiple GPUs

Sayan Ghosh, Sunita Chandrasekaran, Barbara Chapman

2012 Symposium on Application Accelerators in High Performance Computing > 54 - 63

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

A dramatic improvement in energy efficiency is mandatory for sustainable supercomputing and has been identified as a major challenge. Affordable energy solution continues to be of great concern in the development of the next generation of supercomputers. Low power processors, dynamic control of processor frequency and heterogeneous systems are being proposed to mitigate energy costs. However, the...

chapter

GPU Acceleration of Pyrosequencing Noise Removal

Yang Gao, Jason D. Bakos

2012 Symposium on Application Accelerators in High Performance Computing > 94 - 101

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

Amplicon Noise [1], an updated version of Py-ronoise [2], is a tool for removing noise from metagenomic data recorded by a 454 pyrosequencer. Amplicon Noise has shown to be effective in reducing overestimation of operational taxonomic units (OTUs) and chimera detection. Amplicon-Noise's noise removal method relies on clustering a large set of short sequences read by the sequencer. The DNA sequencing...

chapter

Computing large-scale distance matrices on GPU

Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta, Pablo Moscato

2012 7th International Conference on Computer Science & Education (ICCSE) > 576 - 580

2012 7th International Conference on Computer Science & Education (ICCSE 2012)

A distance matrix is simply an n×n two-dimensional array that contains pairwise distances of a set of n points in a metric space. It has a wide range of usage in several fields of scientific research e.g., data clustering, machine learning, pattern recognition, image analysis, information retrieval, signal processing, bioinformatics etc. However, as the size of n increases, the computation of distance...

chapter

Implementation and evaluation of Raptor code on GPU

Linjia Hu, Saeid Nooshabadi, Todor Mladenov

2012 IEEE 16th International Symposium on Consumer Electronics > 1 - 6

2012 IEEE 16th International Symposium on Consumer Electronics (ISCE 2012)

Raptor code, a member of the fountain code family, is a significant theoretical improvement over the Luby transform code (LT code) for forward error correction (FEC) transmission. Graphics processing units (GPUs) have become a common place in the consumer market and are finding their way beyond graphics processing into general purpose computing. This paper investigates the suitability of GPU for Raptor...

chapter

iGPU: Exception support and speculative execution on GPUs

Jaikrishnan Menon, Marc de Kruijf, Karthikeyan Sankaralingam

2012 39th Annual International Symposium on Computer Architecture (ISCA) > 72 - 83

2012 ACM/IEEE 39th International Symposium on Computer Architecture (ISCA)

Since the introduction of fully programmable vertex shader hardware, GPU computing has made tremendous advances. Exception support and speculative execution are the next steps to expand the scope and improve the usability of GPUs. However, traditional mechanisms to support exceptions and speculative execution are highly intrusive to GPU hardware design. This paper builds on two related insights to...

chapter

Implementation of Motion Estimation Based on Heterogeneous Parallel Computing System with OpenCL

Jinglin Zhang, Jean-Francois Nezan, Jean-Gabriel Cousin

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 41 - 45

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. Open Computing Language(OpenCL) is the first open, royalty-free standard for heterogenous computing on multi hardware platforms. In this paper, we propose a parallel Motion Estimation(ME) algorithm implemented using OpenCL and present several...

chapter

OpenCL Remote: Extending OpenCL Platform Model to Network Scale

Ridvan Ozaydin, D. Turgay Altilar

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 830 - 835

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

This paper presents OpenCL Remote framework that extends the native OpenCL platform model to network scale and utilizes the native OpenCL's support of heterogeneous computing. OpenCL Remote boosts performance by distributing computation over network to many compute devices in parallel.

chapter

Implementation of a Lattice Boltzmann Method for Large Eddy Simulation on Multiple GPUs

Qinjian Li, Chengwen Zhong, Kai Li, Guangyong Zhang, more

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 818 - 823

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Recently, the Graphic Processor Unit (GPU) has evolved into a highly parallel, multithreaded, many-core processor with tremendous computational horsepower and very high memory bandwidth. To improve the simulation efficiency of complex flow phenomena in the field of computational fluid dynamics, a CUDA-based simulation algorithm of large eddy simulation using multiple GPUs is proposed. Our implementation...

chapter

UFO: A Scalable GPU-based Image Processing Framework for On-line Monitoring

Matthias Vogelgesang, Suren Chilingaryan, Tomy dos_Santos Rolo, Andreas Kopmann

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 824 - 829

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Current synchrotron experiments require state-of-the-art scientific cameras with sensors that provide several million pixels, each at a dynamic range of up to 16 bits and the ability to acquire hundreds of frames per second. The resulting data bandwidth of such a data stream reaches several Gigabits per second. These streams have to be processed in real-time to achieve a fast process response. In...

chapter

Implementation of histogram based sampling algorithm within an EDA scheme with CUDA

Shigeyoshi Tsutsui, Noriyuki Fujimoto

2012 IEEE Congress on Evolutionary Computation > 1 - 8

2012 IEEE Congress on Evolutionary Computation (CEC)

In this paper, we describe an implementation of Node Histogram Sampling Algorithm (NHBSA) on GPUs with CUDA and apply the algorithm to solve large scale QAP instances. To solve large scale QAP instances, we combined the taboo search with NHBSA. In this implementation, we used an efficient thread assignment method, Move-Cost Adjusted Thread Assignment (MATA), which is proposed in a previous study....

chapter

Multi-GPU island-based genetic algorithm for solving the knapsack problem

Jiri Jaros

2012 IEEE Congress on Evolutionary Computation > 1 - 8

2012 IEEE Congress on Evolutionary Computation (CEC)

This paper introduces a novel implementation of the genetic algorithm exploiting a multi-GPU cluster. The proposed implementation employs an island-based genetic algorithm where every GPU evolves a single island. The individuals are processed by CUDA warps, which enables the solution of large knapsack instances and eliminates undesirable thread divergence. The MPI interface is used to exchange genetic...

chapter

Fast calculation of computer-generated holography using multi-graphic processing units

Joongseok Song, Jungsik Park, Jong-Il Park

IEEE international Symposium on Broadband Multimedia Systems and Broadcasting > 1 - 5

2012 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

A process of generating a digital hologram requires a lot of time-consuming computations. Therefore, it is important to reduce the computation time or the number of computations for achieving real-time digital holographic video generation. In this paper, we propose a method of parallelizing the computations using multiple GPUs with CUDA and OpenMP and an optimization method for reducing the computation...

chapter

Automatic implementation of evolutionary algorithms on GPUs using ESDL

Steve Dower

2012 IEEE Congress on Evolutionary Computation > 1 - 8

2012 IEEE Congress on Evolutionary Computation (CEC)

Modern computer processing units tend towards simpler cores in greater numbers, favouring the development of data-parallel applications. Evolutionary algorithms are ideal for taking full advantage of SIMD (Single Instruction, Multiple Data) processing, which is available on both CPUs and GPUs. Creating software that runs on a GPU requires the use of specialised programming languages or styles, forcing...

chapter

An Efficient Sparse Matrix Multiplication for Skewed Matrix on GPU

Monika Shah, Vibha Patel

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 1301 - 1306

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

This paper presents a new sparse matrix format ALIGNED_COO, an extension to COO format to optimize performance of large sparse matrix having skewed distribution of non-zero elements. Load balancing, alignment and synchronization free distribution of work load are three important factors to improve performance of sparse matrices representing power-law graph. Coordinate (COO) format is selected for...

chapter

Acceleration of Generalized Minimum Aberration Designs of Hadamard Matrices on Graphics Processing Units

Jon Calhoun, Josh Graham, Hong Zhou, Hai Jiang

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 1294 - 1300

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

The process of applying generalized minimum aberration criteria (GMAC) to non-regular fractional factorial designs is extremely computationally intensive. Constructing and ranking all designs can take hours if not days; therefore, exploitation of the massively parallel nature of modern graphics processing units (GPUs) are used to perform the task. The computation is not just ported to the GPU, but...

chapter

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

You Li, Xiaowen Chu

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 1315 - 1320

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Database searching is a main method for protein identification in shotgun proteomics, and many research efforts are dedicated to improving its effectiveness. However, the efficiency of database searching is facing a serious challenge, due to the ever fast growth of protein and peptide databases resulted from genome translations, enzymatic digestions, and post-translational modifications (PTMs). On...

chapter

Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow

Mehdi Goli, Michael T. Garba, Horacio Gonz´lezVélez

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 445 - 452

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Software pipelines permit the decomposition of a repetitive sequential process into a succession of distinguishable sub-processes called stages, each of which can be concurrently executed on a distinct processing element. This paper presents a heterogeneous streaming pipeline implementation using the FastFlow skeletal library for a numerical linear algebra code. By introducing minimal memory management,...

chapter

Ice Simulation Using GPGPU

Shadi Alawneh, Dennis Peters

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 425 - 431

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Simulation of the behaviour of a ship operating in pack ice is a computationally intensive process to which General Purpose Computing on Graphical Processing Units (GPGPU) can be applied. In this paper we present an efficient parallel implementation of such a simulator developed using the NVIDIA Compute Unified Device Architecture (CUDA). We have conducted an experiment to measure the relative performance...

chapter

Implementation and Analysis of AES Encryption on GPU

Qinjian Li, Chengwen Zhong, Kaiyong Zhao, Xinxin Mei, more

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 843 - 848

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPU is continuing its trend of vastly outperforming CPU while becoming more general purpose. In order to improve the efficiency of AES algorithm, this paper proposed a CUDA implementation of Electronic Codebook (ECB) mode encoding process and Cipher Feedback (CBC) mode decoding process on GPU. In our implementation, the frequently accessed T-boxes were allocated on on-chip shared memory and the granularity...

chapter

Accelerating Viola-Jones Facce Detection Algorithm on GPUs

Haipeng Jia, Yunquan Zhang, Weiyan Wang, Jianliang Xu

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 396 - 403

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

The Viola-Jones face detection algorithm represents a class of parallel algorithms that both memory accesses and work distributions are irregular, thereby hard to obtain high performance on GPUs. Furthermore, conventional GPU programming wisdom usually guides us on how to optimize data parallel workloads with regular inputs and outputs. While how to efficiently write task-level parallelism programs...

Data set:
ieee
Keywords:
KERNEL
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Content availability

Available (593)
None (1)

Publication type

book (547)
article (47)

Keywords

INSTRUCTION SETS (306)
GPU (191)
COPROCESSORS (164)
CUDA (145)
COMPUTER GRAPHIC EQUIPMENT (139)
COMPUTATIONAL MODELING (112)
COMPUTER ARCHITECTURE (106)
PARALLEL PROCESSING (106)
GPGPU (73)
OPTIMIZATION (72)
HARDWARE (64)
ARRAYS (62)
PROGRAMMING (55)
MEMORY MANAGEMENT (49)
ACCELERATION (48)
PERFORMANCE EVALUATION (47)
GRAPHICS PROCESSING UNITS (46)
MATHEMATICAL MODEL (42)
ALGORITHM DESIGN AND ANALYSIS (39)
VECTORS (37)
OPENCL (36)
PARALLEL ARCHITECTURES (35)
COMPUTE UNIFIED DEVICE ARCHITECTURE (34)
LIBRARIES (34)
SYNCHRONIZATION (34)
REGISTERS (33)
SPARSE MATRICES (33)
CENTRAL PROCESSING UNIT (31)
COMPUTER GRAPHICS (31)
PIXEL (31)
INDEXES (30)
PARALLEL ALGORITHMS (28)
MULTIPROCESSING SYSTEMS (27)
PARALLEL PROGRAMMING (27)
BANDWIDTH (26)
PARALLEL COMPUTING (26)
EQUATIONS (25)
BENCHMARK TESTING (24)
CONVOLUTION (21)
HIGH PERFORMANCE COMPUTING (21)
MULTICORE PROCESSING (21)
REAL TIME SYSTEMS (21)
GRAPHICS (19)
OPTIMISATION (19)
RUNTIME (19)
THREE DIMENSIONAL DISPLAYS (19)
THROUGHPUT (19)
FIELD PROGRAMMABLE GATE ARRAYS (18)
YARN (18)
IMAGE PROCESSING (17)
RANDOM ACCESS MEMORY (16)
CPU (15)
OPENMP (15)
FEATURE EXTRACTION (14)
GENETIC ALGORITHMS (14)
GPU COMPUTING (14)
GRAPHIC PROCESSING UNIT (14)
TILES (14)
ACCURACY (13)
DATABASES (13)
ENCODING (13)
IMAGE COLOR ANALYSIS (13)
IMAGE RECONSTRUCTION (13)
INTERPOLATION (13)
MATRIX MULTIPLICATION (13)
PIPELINES (13)
SERVERS (13)
LAYOUT (12)
MEDICAL IMAGE PROCESSING (12)
MESSAGE SYSTEMS (12)
MPI (12)
CLUSTERING ALGORITHMS (11)
CONTEXT (11)
DATA STRUCTURES (11)
EDUCATIONAL INSTITUTIONS (11)
ITERATIVE METHODS (11)
JACOBIAN MATRICES (11)
SORTING (11)
TRAINING (11)
ULTRASONIC IMAGING (11)
BIOINFORMATICS (10)
DECODING (10)
IMAGE SEGMENTATION (10)
LATTICES (10)
LINEAR ALGEBRA (10)
NVIDIA (10)
PERFORMANCE (10)
PROTEINS (10)
APPLICATION PROGRAM INTERFACES (9)
CLOCKS (9)
ENERGY CONSUMPTION (9)
ENERGY EFFICIENCY (9)
EVOLUTIONARY COMPUTATION (9)
GRAPHICS PROCESSING UNIT (GPU) (9)
MULTI-THREADING (9)
POLYNOMIALS (9)
SCHEDULES (9)
BIOLOGY COMPUTING (8)
more

INFONA - science communication portal

Search results

Energy Analysis of Parallel Scientific Kernels on Multiple GPUs

GPU Acceleration of Pyrosequencing Noise Removal

Computing large-scale distance matrices on GPU

Implementation and evaluation of Raptor code on GPU

iGPU: Exception support and speculative execution on GPUs

Implementation of Motion Estimation Based on Heterogeneous Parallel Computing System with OpenCL

OpenCL Remote: Extending OpenCL Platform Model to Network Scale

Implementation of a Lattice Boltzmann Method for Large Eddy Simulation on Multiple GPUs

UFO: A Scalable GPU-based Image Processing Framework for On-line Monitoring

Implementation of histogram based sampling algorithm within an EDA scheme with CUDA

Multi-GPU island-based genetic algorithm for solving the knapsack problem

Fast calculation of computer-generated holography using multi-graphic processing units

Automatic implementation of evolutionary algorithms on GPUs using ESDL

An Efficient Sparse Matrix Multiplication for Skewed Matrix on GPU

Acceleration of Generalized Minimum Aberration Designs of Hadamard Matrices on Graphics Processing Units

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow

Ice Simulation Using GPGPU

Implementation and Analysis of AES Encryption on GPU

Accelerating Viola-Jones Facce Detection Algorithm on GPUs

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options