Search results

Items from 141 to 160 out of 303 results

1 ...
5
6
7
8
9
10
11

chapter

Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA

Yu-Rong Chen, Che Lun Hung, Yu-Shiang Lin, Chun-Yuan Lin, more

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 849 - 854

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

The construction of phylogenetic trees is important for the computational biology, especially for the development of biological taxonomies. UPGMA is one of the most popular heuristic algorithms for constructing ultrametric trees (UT). Although the UT constructed by the UPGMA often is not a true tree unless the molecular clock assumption holds, the UT is still useful for the clocklike data. However,...

chapter

Fast Linear Algebra on GPU

Lukas Polok, Pavel Smrz

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 439 - 444

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is a minimal size of primitives being handled in order to achieve significant speedups compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching operations to have sufficient amount of data...

chapter

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

Walid Abu-Sufah, Asma Abdel Karim

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 453 - 460

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed Jagged Diagonal storage format which is tailored for GPUs, BTJAD. We develop a highly optimized SpMV kernel that takes advantage of the properties...

chapter

Multi-biomarker panel selection on a GPU

David Johnson, Brandon Shafer, Jaehwan John Lee, Jake Y. Chen

2012 IEEE International Conference on Electro/Information Technology > 1 - 6

2012 IEEE International Conference on Electro/Information Technology (EIT 2012)

Liquid chromatography-based tandem mass spectrometry (LC-MS) technique allows for identification and quantification of thousands of proteins in parallel. This technique coupled with a feed-forward artificial neural network provides a technique to analyze and select protein panels for use in multi-biomarker panel discovery applications. In this study, we enhance this technique by utilizing massively...

chapter

GPU Implementation of the Branch and Bound Method for Knapsack Problems

Mohamed Esseghir Lalami, Didier El-Baz

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1769 - 1777

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we propose an efficient implementation of the branch and bound method for knapsack problems on a CPU-GPU system via CUDA. Branch and bound computations can be carried out either on the CPU or on a GPU according to the size of the branch and bound list. A better management of GPUs memories, less GPUCPU communications and better synchronization between GPU threads are proposed in this...

chapter

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Yongchao Liu, Bertil Schmidt

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 684 - 690

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between...

chapter

Towards the Design of Systolic Genetic Search

Martin Pedemonte, Enrique Alba, Francisco Luna

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1778 - 1786

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

This paper elaborates on a new, fresh parallel optimization algorithm specially engineered to run on Graphic Processing Units (GPUs). The underlying operation relates to Systolic Computation. The algorithm, called Systolic Genetic Search (SGS) is based on the synchronous circulation of solutions through a grid of processing units and tries to profit from the parallel architecture of GPUs. The proposed...

chapter

Implementing High-performance Intensity Model with Blur Effect on GPUs for Large-scale Star Image Simulation

Chao Li, Yunquan Zhang, Changwen Zheng, Xiaohui Hu

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1879 - 1888

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Intensity model with blur effects are widely employed to accurately simulate the imaging process of a star simulator used for attitude determination and guiding feedback. The model is computationally intensive and the time requirements are proportional to the number of stars in the simulation, imposing great demands of computing power for realistic uses. This paper presents two star simulators using...

chapter

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation

Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1696 - 1702

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme...

chapter

Generating Device-specific GPU Code for Local Operators in Medical Imaging

Richard Membarth, Frank Hannig, Jurgen Teich, Mario Korner, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 569 - 581

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domain-specific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler...

chapter

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant V. Kale

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2404 - 2413

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism...

chapter

Parallel Algorithms for Approximate String Matching with k Mismatches on CUDA

Yu Liu, Longjiang Guo, Jinbao Li, Meirui Ren, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2414 - 2422

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Approximate string matching using the k-mismatch technique has been widely applied to many fields such as virus detection and computational biology. The traditional parallel algorithms are all based on multiple processors, which have high costs of computing and communication. GPU has high parallel processing capability, low cost of computing, and less time of communication. To the best of our knowledge,...

chapter

A Highly Efficient Implementation of I/O Functions on GPU

Wei Wu, Feng Bin Qi, Wang Quan He, Shan Shan Wang

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2378 - 2383

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The API interfaces provided by CUDA can help programmers develop CUDA applications and get high performance in GPU. However, many of the I/O operations are not supported in device codes. This paper has implemented most of the I/O functions through host's agent by using the characteristics of mapped memory in CUDA, such as read/write file and ¡®printf'. The methods that used to implement these I/O...

chapter

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Jie Chen, Balint Joo, William Watson III, Robert Edwards

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2359 - 2368

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the last few years, many scientific applications have been developed for powerful graphics processing units (GPUs) and have achieved remarkable speedups. This success can be partially attributed to high performance host callable GPU library routines that are offloaded to GPUs at runtime. These library routines are based on C/C++-like programming toolkits such as CUDA from NVIDIA and have the same...

chapter

Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture

Tyng-Yeu Liang, Hung-Fu Li, Jun-Yao Chiu

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2369 - 2377

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Hybrid CPU/GPU computing architecture recently has become an alternative platform for high performance computing. This architecture provides massive computational power with lower energy consumption and less economic cost than the traditional one using only CPUs. However, the complexity of the GPU programming is too high for users to move their applications toward this hybrid computing architecture...

chapter

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization

Zheng Cui, Yun Liang, Kyle Rupnow, Deming Chen

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 83 - 94

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graphics processing units (GPUs) are increasingly critical for general-purpose parallel processing performance. GPU hardware is composed of many streaming multiprocessors, each of which employs the single-instruction multiple-data (SIMD) execution style. This massively parallel architecture allows GPUs to execute tens of thousands of threads in parallel. Thus, GPU architectures efficiently execute...

chapter

Image Authentication Algorithm on GPU

P.L.V. Vihari, Manoj Mishra

2012 International Conference on Communication Systems and Network Technologies > 874 - 878

2012 International Conference on Communication Systems and Network Technologies (CSNT)

As the demand for research on Image/ Content authentication has significantly increased, many authentication schemes have been proposed so far. But most of them are time consuming. This research concentrates on decreasing the time needed by an Image authentication algorithm. In this paper, we have shown a CUDA-based implementation of content authentication algorithm with NVIDIA's GeForce 8400 GS GPU...

chapter

Image convolution processing: A GPU versus FPGA comparison

Lucas M. Russo, Emerson C. Pedrino, Edilson Kato, Valentin Obac Roda

2012 VIII Southern Conference on Programmable Logic > 1 - 6

2012 VIII Southern Conference on Programmable Logic (SPL)

Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs better and in what scenario. In this article, convolution...

chapter

Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units

Yaodong Zhang, Kiarash Adl, James Glass

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5173 - 5176

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper we present a fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs). The lower-bound estimate and the K nearest neighbor DTW search are carefully designed to fit the GPU parallel computing architecture. In a spoken term detection task on the TIMIT corpus, a 55x speed-up is achieved compared to our previous...

chapter

CUDA based Particle Swarm Optimization for geophysical inversion

Debanjan Datta, Suman Mehta, Shalivahan, Ravi Srivastava

2012 1st International Conference on Recent Advances in Information Technology (RAIT) > 416 - 420

2012 1st International Conference on Recent Advances in Information Technology (RAIT)

Many geophysical problems are computationally expensive owing to their iterative nature or due to the programs processing to large datasets. Such problems are challenging and have to be approached with extreme caution because a wrong parameter selection will not only lead to wrong results but will also take up a lot of time. The Compute Unified Device Architecture (CUDA) introduced by NVIDIA has enabled...

1 ...
5
6
7
8
9
10
11

Keywords:
KERNEL
CUDA

Publication date

Set your own date range

Content availability

Available (297)
None (6)

Keywords

INSTRUCTION SETS (164)
GPU (142)
GRAPHICS PROCESSING UNIT (138)
GRAPHICS PROCESSING UNITS (130)
COPROCESSORS (72)
GPGPU (69)
COMPUTER ARCHITECTURE (63)
PARALLEL PROCESSING (58)
COMPUTATIONAL MODELING (56)
COMPUTER GRAPHIC EQUIPMENT (51)
PROGRAMMING (43)
ARRAYS (37)
OPTIMIZATION (34)
YARN (33)
MATHEMATICAL MODEL (26)
ACCELERATION (25)
PERFORMANCE EVALUATION (25)
COMPUTE UNIFIED DEVICE ARCHITECTURE (24)
HARDWARE (24)
MEMORY MANAGEMENT (24)
PARALLEL ARCHITECTURES (24)
COMPUTER GRAPHICS (23)
REGISTERS (22)
LIBRARIES (21)
PARALLEL COMPUTING (21)
ALGORITHM DESIGN AND ANALYSIS (20)
OPENMP (18)
SPARSE MATRICES (17)
SYNCHRONIZATION (17)
VECTORS (17)
CENTRAL PROCESSING UNIT (16)
GRAPHICS (16)
EQUATIONS (15)
OPENCL (15)
THROUGHPUT (15)
RUNTIME (14)
DATA MINING (13)
PARALLEL PROGRAMMING (13)
PARALLEL ALGORITHMS (12)
DATA STRUCTURES (11)
INDEXES (11)
MPI (11)
BENCHMARK TESTING (10)
BANDWIDTH (9)
BIOINFORMATICS (9)
GPU COMPUTING (9)
IMAGE EDGE DETECTION (9)
IMAGE PROCESSING (9)
MULTI-THREADING (9)
MULTICORE PROCESSING (9)
PIXEL (9)
DATA TRANSFER (8)
HISTOGRAMS (8)
MICROPROCESSOR CHIPS (8)
CONVOLUTION (7)
CPU (7)
DECODING (7)
HIGH PERFORMANCE COMPUTING (7)
ITERATIVE METHODS (7)
MATRIX MULTIPLICATION (7)
NVIDIA (7)
REAL-TIME SYSTEMS (7)
SPMV (7)
TRAINING (7)
ENCODING (6)
FEATURE EXTRACTION (6)
GENETIC ALGORITHMS (6)
GRAPHIC PROCESSING UNIT (6)
HEURISTIC ALGORITHMS (6)
IMAGE COLOR ANALYSIS (6)
IMAGE RECONSTRUCTION (6)
MAGNETIC CORES (6)
MESSAGE SYSTEMS (6)
MULTIPROCESSING SYSTEMS (6)
PROGRAM PROCESSORS (6)
RANDOM ACCESS MEMORY (6)
RENDERING (COMPUTER GRAPHICS) (6)
THREE DIMENSIONAL DISPLAYS (6)
APPROXIMATION ALGORITHMS (5)
CLUSTERING ALGORITHMS (5)
COMPUTATIONAL COMPLEXITY (5)
CRYPTOGRAPHY (5)
DATA MODELS (5)
FINITE DIFFERENCE METHODS (5)
GENOMICS (5)
MATHEMATICS COMPUTING (5)
MEDICAL IMAGE PROCESSING (5)
NUMERICAL MODELS (5)
NVIDIA GPU (5)
PARALLEL (5)
PATTERN CLUSTERING (5)
PERFORMANCE ANALYSIS (5)
POWER AWARE COMPUTING (5)
PROTEINS (5)
RADIATION DETECTORS (5)
SHAPE (5)
SHARED MEMORY (5)
TUNING (5)
more

INFONA - science communication portal

Search results

Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA

Fast Linear Algebra on GPU

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

Multi-biomarker panel selection on a GPU

GPU Implementation of the Branch and Bound Method for Knapsack Problems

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Towards the Design of Systolic Genetic Search

Implementing High-performance Intensity Model with Blur Effect on GPUs for Large-scale Star Image Simulation

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation

Generating Device-specific GPU Code for Local Operators in Medical Imaging

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Parallel Algorithms for Approximate String Matching with k Mismatches on CUDA

A Highly Efficient Implementation of I/O Functions on GPU

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization

Image Authentication Algorithm on GPU

Image convolution processing: A GPU versus FPGA comparison

Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units

CUDA based Particle Swarm Optimization for geophysical inversion

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options