Search results

Items from 101 to 120 out of 594 results

1 ...
3
4
5
6
7
8
9

chapter

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Shucai Xiao, Wu-chun Feng

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2554 - 2557

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Graphics Processing Units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging...

chapter

GPU Implementation of the Branch and Bound Method for Knapsack Problems

Mohamed Esseghir Lalami, Didier El-Baz

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1769 - 1777

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we propose an efficient implementation of the branch and bound method for knapsack problems on a CPU-GPU system via CUDA. Branch and bound computations can be carried out either on the CPU or on a GPU according to the size of the branch and bound list. A better management of GPUs memories, less GPUCPU communications and better synchronization between GPU threads are proposed in this...

chapter

Energy Efficiency Analysis of GPUs

Juan M. Cebri'n, Gines D. Guerrero, Jose M. Garcia

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1014 - 1022

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the last few years, Graphics Processing Units (GPUs) have become a great tool for massively parallel computing. GPUs are specifically designed for throughput and face several design challenges, specially what is known as the Power and Memory Walls. In these devices, available resources should be used to enhance performance and throughput, as the performance per watt is really high. For massively...

chapter

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Yongchao Liu, Bertil Schmidt

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 684 - 690

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between...

chapter

Parameterized Verification of GPU Kernel Programs

Guodong Li, Ganesh Gopalakrishnan

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2450 - 2459

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool checks the functional equivalence of a kernel and its optimized versions, helping debug errors introduced during memory coalescing and bank conflict elimination related optimizations. Key features of our work include: (1) a symbolic method...

chapter

An implementation of Coincidence Algorithm on Graphic Processing Units

Thitipan Tongsiri, Prabhas Chongstitvatana

2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE) > 126 - 130

2012 International Joint Conference on Computer Science and Software Engineering (JCSSE)

Genetic Algorithms (GAs) are powerful search techniques. However when they are applied to complex problems, they consume large computation power. One of the choices to make them faster is to use a parallel implementation. This paper presents a parallel implementation of Combinatorial Optimisation with Coincidence Algorithm (COIN) on Graphic Processing Units. COIN is a modern GA. It has a wide range...

chapter

Towards the Design of Systolic Genetic Search

Martin Pedemonte, Enrique Alba, Francisco Luna

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1778 - 1786

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

This paper elaborates on a new, fresh parallel optimization algorithm specially engineered to run on Graphic Processing Units (GPUs). The underlying operation relates to Systolic Computation. The algorithm, called Systolic Genetic Search (SGS) is based on the synchronous circulation of solutions through a grid of processing units and tries to profit from the parallel architecture of GPUs. The proposed...

chapter

Design of Direct Communication Facility for Many-Core Based Accelerators

Min Si, Yutaka Ishikawa

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 924 - 929

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

A direct communication facility, called DCFA, for a many-core based cluster, whose compute node consists of many-core units connected to the host via PCI Express with Infiniband, is designed and evaluated. Because a many-core unit is a device of the PCI Express bus, it is not capable of configuring and initializing the Infiniband HCA, according to the PCI Express specification. This means that the...

chapter

dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

Philipp Kegel, Michel Steuwer, Sergei Gorlatch

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 174 - 186

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we presentd OpenCL (Distributed...

chapter

Implementing High-performance Intensity Model with Blur Effect on GPUs for Large-scale Star Image Simulation

Chao Li, Yunquan Zhang, Changwen Zheng, Xiaohui Hu

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1879 - 1888

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Intensity model with blur effects are widely employed to accurately simulate the imaging process of a star simulator used for attitude determination and guiding feedback. The model is computationally intensive and the time requirements are proportional to the number of stars in the simulation, imposing great demands of computing power for realistic uses. This paper presents two star simulators using...

chapter

Experiences in Teaching a Specialty Multicore Computing Course

Peter E. Strazdins

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1283 - 1288

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We detail the design and experiences in delivering a specialty multicore computing course whose materials are openly available. The course ambitiously covers three multicore programming paradigms: shared memory (OpenMP), device (CUDA) and message passing (RCCE), and involves significant practical work on their respective platforms: an UltraSPARC T2, Fermi GPU and the Intel Single-Chip Cloud Computer...

chapter

A Speculative HMMER Search Implementation on GPU

Xiaoqiang Li, Wenting Han, Gu Liu, Hong An, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 735 - 741

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Due to the exponentially growing bioinformatics databases and rapidly popular of GPU for general purpose computing, it is promising to employ GPU techniques to accelerate the sequence search process. Hmmsearch from HMMER bioinformatics software package is a wildly used software tool for sensitive profile HMM (Hidden Markov Model) searches of biological sequence databases. In this paper, we implement...

chapter

Implementation of XcalableMP Device Acceleration Extention with OpenCL

Takuma Nomizu, Daisuke Takahashi, Jinpil Lee, Taisuke Boku, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2394 - 2403

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL,...

chapter

Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

Kumiko Maeda, Masana Murase, Munehiro Doi, Hideaki Komatsu, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 544 - 556

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil application development. To address this problem, we developed an automatic code generation tool to produce a parallel stencil application with latency hiding automatically from its dataflow model...

chapter

Generating Device-specific GPU Code for Local Operators in Medical Imaging

Richard Membarth, Frank Hannig, Jurgen Teich, Mario Korner, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 569 - 581

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domain-specific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler...

chapter

Dynamic load balancing on GPU clusters for large-scale K-Means clustering

Ekasit Kijsipongse, Suriya U-ruekolan

2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE) > 346 - 350

2012 International Joint Conference on Computer Science and Software Engineering (JCSSE)

K-Means is the clustering algorithm which is widely used in many areas such as information retrieval, computer vision and pattern recognition. With the recent advance in General Purpose Graphics Processing Unit (GPGPU), we can use a modern GPU which is capable to do computation up to Tflops to calculate K-Means clustering on average problems. However, due to the exponential growth of data, the K-Means...

chapter

Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

Michel Steuwer, Philipp Kegel, Sergei Gorlatch

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1858 - 1865

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches - CUDA and OpenCL - are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper is built on top of the OpenCL standard and offers pre-implemented recurring computation and communication patterns (skeletons)...

chapter

Deriving a Methodology for Code Deployment on Multi-Core Platforms via Iterative Manual Optimizations

Stuart McCool, Peter Milligan, Paul Sage

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1406 - 1415

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In recent years, there has been what can only be described as an explosion in the types of processing devices one can expect to find within a given computer system. These include the multi-core CPU, the General Purpose Graphics Processing Unit (GPGPU) and the Accelerated Processing Unit (APU), to name but a few. The widespread uptake of these systems presents would-be users with at least two problems...

chapter

Evaluating Polynomials in Several Variables and their Derivatives on a GPU Computing Processor

Jan Verschelde, Genady Yoffe

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1397 - 1405

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in particular Newton's method with a general purpose graphics processing unit. In this paper we describe algorithms for the massively parallel evaluation and differentiation...

chapter

Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs

Daichi Mukunoki, Daisuke Takahashi

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1378 - 1386

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We implemented and evaluated the triple precision Basic Linear Algebra Subprograms (BLAS) subroutines, AXPY, GEMV and GEMM on a Tesla C2050. In this paper, we present a Double Single (D+S) type triple precision floating-point value format and operations. They are based on techniques similar to Double-Double (DD) type quadruple precision operations. On the GPU, the D+S-type operations are more costly...

1 ...
3
4
5
6
7
8
9

Data set:
ieee
Keywords:
KERNEL
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Content availability

Available (593)
None (1)

Publication type

book (547)
article (47)

Keywords

INSTRUCTION SETS (306)
GPU (191)
COPROCESSORS (164)
CUDA (145)
COMPUTER GRAPHIC EQUIPMENT (139)
COMPUTATIONAL MODELING (112)
COMPUTER ARCHITECTURE (106)
PARALLEL PROCESSING (106)
GPGPU (73)
OPTIMIZATION (72)
HARDWARE (64)
ARRAYS (62)
PROGRAMMING (55)
MEMORY MANAGEMENT (49)
ACCELERATION (48)
PERFORMANCE EVALUATION (47)
GRAPHICS PROCESSING UNITS (46)
MATHEMATICAL MODEL (42)
ALGORITHM DESIGN AND ANALYSIS (39)
VECTORS (37)
OPENCL (36)
PARALLEL ARCHITECTURES (35)
COMPUTE UNIFIED DEVICE ARCHITECTURE (34)
LIBRARIES (34)
SYNCHRONIZATION (34)
REGISTERS (33)
SPARSE MATRICES (33)
CENTRAL PROCESSING UNIT (31)
COMPUTER GRAPHICS (31)
PIXEL (31)
INDEXES (30)
PARALLEL ALGORITHMS (28)
MULTIPROCESSING SYSTEMS (27)
PARALLEL PROGRAMMING (27)
BANDWIDTH (26)
PARALLEL COMPUTING (26)
EQUATIONS (25)
BENCHMARK TESTING (24)
CONVOLUTION (21)
HIGH PERFORMANCE COMPUTING (21)
MULTICORE PROCESSING (21)
REAL TIME SYSTEMS (21)
GRAPHICS (19)
OPTIMISATION (19)
RUNTIME (19)
THREE DIMENSIONAL DISPLAYS (19)
THROUGHPUT (19)
FIELD PROGRAMMABLE GATE ARRAYS (18)
YARN (18)
IMAGE PROCESSING (17)
RANDOM ACCESS MEMORY (16)
CPU (15)
OPENMP (15)
FEATURE EXTRACTION (14)
GENETIC ALGORITHMS (14)
GPU COMPUTING (14)
GRAPHIC PROCESSING UNIT (14)
TILES (14)
ACCURACY (13)
DATABASES (13)
ENCODING (13)
IMAGE COLOR ANALYSIS (13)
IMAGE RECONSTRUCTION (13)
INTERPOLATION (13)
MATRIX MULTIPLICATION (13)
PIPELINES (13)
SERVERS (13)
LAYOUT (12)
MEDICAL IMAGE PROCESSING (12)
MESSAGE SYSTEMS (12)
MPI (12)
CLUSTERING ALGORITHMS (11)
CONTEXT (11)
DATA STRUCTURES (11)
EDUCATIONAL INSTITUTIONS (11)
ITERATIVE METHODS (11)
JACOBIAN MATRICES (11)
SORTING (11)
TRAINING (11)
ULTRASONIC IMAGING (11)
BIOINFORMATICS (10)
DECODING (10)
IMAGE SEGMENTATION (10)
LATTICES (10)
LINEAR ALGEBRA (10)
NVIDIA (10)
PERFORMANCE (10)
PROTEINS (10)
APPLICATION PROGRAM INTERFACES (9)
CLOCKS (9)
ENERGY CONSUMPTION (9)
ENERGY EFFICIENCY (9)
EVOLUTIONARY COMPUTATION (9)
GRAPHICS PROCESSING UNIT (GPU) (9)
MULTI-THREADING (9)
POLYNOMIALS (9)
SCHEDULES (9)
BIOLOGY COMPUTING (8)
more

INFONA - science communication portal

Search results

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

GPU Implementation of the Branch and Bound Method for Knapsack Problems

Energy Efficiency Analysis of GPUs

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Parameterized Verification of GPU Kernel Programs

An implementation of Coincidence Algorithm on Graphic Processing Units

Towards the Design of Systolic Genetic Search

Design of Direct Communication Facility for Many-Core Based Accelerators

dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

Implementing High-performance Intensity Model with Blur Effect on GPUs for Large-scale Star Image Simulation

Experiences in Teaching a Specialty Multicore Computing Course

A Speculative HMMER Search Implementation on GPU

Implementation of XcalableMP Device Acceleration Extention with OpenCL

Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

Generating Device-specific GPU Code for Local Operators in Medical Imaging

Dynamic load balancing on GPU clusters for large-scale K-Means clustering

Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

Deriving a Methodology for Code Deployment on Multi-Core Platforms via Iterative Manual Optimizations

Evaluating Polynomials in Several Variables and their Derivatives on a GPU Computing Processor

Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options