Search results

Items from 21 to 33 out of 33 results

chapter

A Speculative HMMER Search Implementation on GPU

Xiaoqiang Li, Wenting Han, Gu Liu, Hong An, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 735 - 741

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Due to the exponentially growing bioinformatics databases and rapidly popular of GPU for general purpose computing, it is promising to employ GPU techniques to accelerate the sequence search process. Hmmsearch from HMMER bioinformatics software package is a wildly used software tool for sensitive profile HMM (Hidden Markov Model) searches of biological sequence databases. In this paper, we implement...

chapter

The research on parallelized fast trilateral filter on GPU acceleration

Xujie Li

2011 International Conference on Electronics, Communications and Control (ICECC) > 1158 - 1161

2011 International Conference on Electronics, Communications and Control (ICECC)

This paper design and implement a parallel fast trilateral filter algorithm on GPU. The trilateral filter can be decomposed into two bilateral filters which are implemented by the parallelized bilateral filter on CUDA separately. The performance optimization strategies are intensively discussed. The occupancy of CUDA kernel for trilateral filter reaches 0.833 which is measured by Visual Profiler....

chapter

G-NetMon: A GPU-accelerated Network Performance Monitoring System

Wenji Wu, Phil DeMar, Don Holmgren, Amitoj Singh

2011 Symposium on Application Accelerators in High-Performance Computing > 76 - 79

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

At Fermilab, we have prototyped a GPU-accelerated network performance monitoring system, called G-NetMon, to support large-scale scientific collaborations. In this work, we explore new opportunities in network traffic monitoring and analysis with GPUs. Our system exploits the data parallelism that exists within network flow data to provide fast analysis of bulk data movement between Fermilab and collaboration...

chapter

Porting Optimized GPU Kernels to a Multi-core CPU: Computational Quantum Chemistry Application Example

Dong Ye, Alexey Titov, Volodymyr Kindratenko, Ivan Ufimtsev, more

2011 Symposium on Application Accelerators in High-Performance Computing > 72 - 75

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

We investigate techniques for optimizing a multi-core CPU code back ported from a highly optimized GPU kernel. We show that common sub-expression elimination and loop unrolling optimization techniques improve code performance on the GPU, but not on the CPU. On the other hand, register reuse and loop merging are effective on the CPU and in combination they improve performance of the ported code by...

chapter

A Scalable LDPC Decoder on GPU

K K Abburi

2011 24th Internatioal Conference on VLSI Design > 183 - 188

2011 24th International Conference on VLSI Design: concurrently with the 10th International Conference on Embedded Systems Design

A flexible and scalable approach for LDPC decoding on CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this...

chapter

Auto-tuning Dense Matrix Multiplication for GPGPU with Cache

Xiang Cui, Yifeng Chen, Changyou Zhang, Hong Mei

2010 IEEE 16th International Conference on Parallel and Distributed Systems > 237 - 242

2010 IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS 2010)

In this paper we discuss about our experiences in improving the performance of GEMM (both single and double precision) on Fermi architecture using CUDA, and how the new features of Fermi such as cache affect performance. It is found that the addition of cache in GPU on one hand helps the processers take advantage of data locality occurred in runtime but on the other hand renders the dependency of...

chapter

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Seyong Lee, Rudolf Eigenmann

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for high performance computing. The CUDA (Compute Unified Device Architecture) programming model provides improved programmability for general computing on GPGPUs. However, its unique execution model and memory model still pose significant challenges for developers of efficient GPGPU code. This paper proposes a new...

chapter

A Micro-benchmark Suite for AMD GPUs

Ryan Taylor, Xiaoming Li

2010 39th International Conference on Parallel Processing Workshops > 387 - 396

2010 39th International Conference on Parallel Processing Workshops (ICPPW)

Optimizing programs for Graphic Processing Unit (GPU) requires thorough knowledge about the values of architectural features for the new computing platform. However, this knowledge is frequently unavailable, e.g., due to insufficient documentation, which is probably a result of the infancy of general purpose computing on the GPU. What makes the modeling of program performance on GPU even more difficult...

chapter

Password Recovery for RAR Files Using CUDA

Guang Hu, Jianhua Ma, Benxiong Huang

2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing > 486 - 490

2009 International Conference on Dependable, Autonomic and Secure Computing (DASC 2009)

Driven by the insatiable demand of real-time graphics, especially from the market of computer games, graphics processing unit (GPU) is becoming a major computing horsepower during recent years since the performance of GPU is surpassing that of the contemporary CPU. This paper presents our study on how to efficiently recover the passwords for encrypted RAR files. Our research focus is on the AES key...

chapter

Direct N-body Kernels for Multicore Platforms

N. Arora, A. Shringarpure, R.W. Vuduc

2009 International Conference on Parallel Processing > 379 - 387

2009 International Conference on Parallel Processing (ICPP 2009)

We present an inter-architectural comparison of single-and double-precision direct n-body implementations on modern multicore platforms, including those based on the Intel Nehalem and AMD Barcelona systems, the Sony-Toshiba-IBM PowerXCell/8i processor, and NVIDA Tesla C870 and C1060 GPU systems. We compare our implementations across platforms on a variety of proxy measures, including performance,...

chapter

Implementations of hardware acceleration for MD4-family algorithms based on GPU

Wenchao Zhou, Hongwei Wu, Xiaochao Li, Donghui Guo

2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication > 571 - 574

2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication (2009 ASID)

The MD4-family algorithms have been widely applied in cryptographic field. Nowadays, it is discovered that MD4-family algorithms are also suitable for random number generators. Since the MD4-family algorithms are computing intensive, they can be accelerated on Graphics Processing Units (GPUs) to generate massive high-quality random numbers. This paper presents acceleration of MD4-family algorithms...

chapter

An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases

L. Ligowski, W. Rudnicki

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 8

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The Smith Waterman algorithm for sequence alignment is one of the main tools of bioinformatics. It is used for sequence similarity searches and alignment of similar sequences. The high end graphical processing unit (GPU), used for processing graphics on desktop computers, deliver computational capabilities exceeding those of CPUs by an order of magnitude. Recently these capabilities became accessible...

chapter

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

A. Nukada, Y. Ogata, T. Endo, S. Matsuoka

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Most GPU performance ldquohypesrdquo have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being...

Data set:
ieee
Keywords:
KERNEL
GPU
REGISTERS
Publication language:
English

Publication date

Set your own date range

Publication type

book (31)
article (2)

Keywords

INSTRUCTION SETS (24)
GRAPHICS PROCESSING UNITS (20)
CUDA (12)
GRAPHICS PROCESSING UNIT (11)
OPTIMIZATION (9)
COMPUTER ARCHITECTURE (6)
HARDWARE (6)
COMPUTER GRAPHIC EQUIPMENT (5)
COPROCESSORS (5)
MEMORY MANAGEMENT (5)
PROGRAMMING (5)
ARRAYS (4)
YARN (4)
GPGPU (3)
LIBRARIES (3)
OPENCL (3)
PARALLEL PROCESSING (3)
SPARSE MATRICES (3)
SPMV (3)
TUNING (3)
ACCELERATION (2)
ALGORITHM DESIGN AND ANALYSIS (2)
BANDWIDTH (2)
CENTRAL PROCESSING UNIT (2)
CODE OPTIMIZATION (2)
COLLABORATION (2)
COMPUTE UNIFIED DEVICE ARCHITECTURE (2)
COMPUTER GRAPHICS (2)
DATABASES (2)
DIVERGENCE (2)
MEDICAL IMAGING (2)
MICROPROCESSOR CHIPS (2)
NVIDIA CUDA (2)
PERFORMANCE (2)
PERFORMANCE EVALUATION (2)
RADIATION DETECTORS (2)
RUNTIME (2)
SEQUENCE ALIGNMENT (2)
SIMT (2)
SPARSE LINEAR ALGEBRA (2)
STENCIL (2)
SUPPORT VECTOR MACHINES (2)
TEXTURE ANALYSIS (2)
TRAINING (2)
UNSTRUCTURED MESH (2)
VECTORS (2)
ADDRESS SEQUENCES (1)
AES (1)
AES KEY GENERATION PROCESSING (1)
ALU-FETCH OPERATION RATIO (1)
AMD (1)
AMD GPU (1)
AMD PIXEL SHADER (1)
AMD STREAMSDK (1)
AMPLICON NOISE (1)
ARCHITECTURAL FEATURES (1)
ASYNCHRONOUS DATA TRANSFERS (1)
ATI (1)
ATI STREAM COMPUTING ENVIRONMENT (1)
AUTO-TUNED SGEMM (1)
AUTO-TUNING (1)
AUTO-TUNING DENSE MATRIX MULTIPLICATION (1)
AUTOMATIC TUNING (1)
AUTOTUNING (1)
BANDWIDTH INTENSIVE 3D FFT KERNEL (1)
BARCELONA (1)
BASIC PROGRAM CHARACTERISTICS (1)
BATCHED (1)
BENCHMARK (1)
BENCHMARK TESTING (1)
BIOINFORMATICS (1)
BURST WRITE LATENCY (1)
CACHE AFFECT PERFORMANCE (1)
CACHE STORAGE (1)
CATALOGS (1)
CCC (1)
CELL (1)
CHECKPOINT/START (1)
CHEMISTRY (1)
CHOLESKY FACTORIZATION (1)
CNN (1)
CODING COMPLEXITY (1)
COHERENCE (1)
COMMON SUB-EXPRESSION ELIMINATION (1)
COMPILATION (1)
COMPILER (1)
COMPILER OPTIMIZATIONS (1)
COMPILER TRANSFORMATIONS (1)
COMPUTE SHADER MODES (1)
COMPUTER GAMES (1)
COMPUTER UNIFIED DEVICE ARCHITECTURE (1)
CONFERENCES (1)
CONJUGATE GRADIENT (1)
CONTEXT (1)
CONTEXT STACK (1)
CONVOLUTION (1)
CPU (1)
more

INFONA - science communication portal

Search results

A Speculative HMMER Search Implementation on GPU

The research on parallelized fast trilateral filter on GPU acceleration

G-NetMon: A GPU-accelerated Network Performance Monitoring System

Porting Optimized GPU Kernels to a Multi-core CPU: Computational Quantum Chemistry Application Example

A Scalable LDPC Decoder on GPU

Auto-tuning Dense Matrix Multiplication for GPGPU with Cache

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

A Micro-benchmark Suite for AMD GPUs

Password Recovery for RAR Files Using CUDA

Direct N-body Kernels for Multicore Platforms

Implementations of hardware acceleration for MD4-family algorithms based on GPU

An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options