2011 Symposium on Application Accelerators in High-Performance Computing

Object tracking is an important task in computer vision applications. One of the crucial challenges is the real-time speed requirement. In this paper we implement an object tracking system in reconfigurable hardware using an efficient parallel architecture. In our implementation, we adopt a background subtraction based algorithm. The designed object tracker exploits hardware parallelism to achieve...

chapter

Iterative Refinement on FPGAs

Jun Kyu Lee, Gregory D. Peterson

2011 Symposium on Application Accelerators in High-Performance Computing > 8 - 13

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Achievable accuracy for mixed precision iterative refinement depends on the precisions supported by computing platforms. Even though the arithmetic unit precision can be flexible for programmable logic computing architectures (e.g. FPGAs), previous work rarely discusses the performance benefits due to enabling flexible achievable accuracy. Hence, we propose an iterative refinement approach on FPGAs...

chapter

GPU-Accelerated Wire-Length Estimation for FPGA Placement

Christian Fobel, Gary Grewal, Deborah Stacey

2011 Symposium on Application Accelerators in High-Performance Computing > 14 - 23

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

In the FPGA design flow, placement remains one of the most time-consuming stages, and is also crucial in terms of quality of result. HPWL and Star+ are widely used as cost metrics in FPGA placement for estimating the total wire-length of a candidate placement prior to routing. However, both wire-length models are expensive to compute requiring O(nm) time, where n is the number of nets and m is the...

chapter

Accelerating a Climate Physics Model with OpenCL

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou

2011 Symposium on Application Accelerators in High-Performance Computing > 24 - 33

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase performance even...

chapter

Experience Applying Fortran GPU Compilers to Numerical Weather Prediction

T. Henderson, J. Middlecoff, J. Rosinski, M. Govett, more

2011 Symposium on Application Accelerators in High-Performance Computing > 34 - 41

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Graphics Processing Units (GPUs) have enabled significant improvements in computational performance compared to traditional CPUs in several application domains. Until recently, GPUs have been programmed using C/C++ based methods such as CUDA (NVIDIA) and OpenCL (NVIDIA and AMD). Using these approaches, Fortran Numerical Weather Prediction (NWP) codes would have to be completely re-written to take...

chapter

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Pei-Hung Lin, Jagan Jayaraj, Paul R. Woodward

2011 Symposium on Application Accelerators in High-Performance Computing > 42 - 51

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multi fluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation...

chapter

Non-serial Polyadic Dynamic Programming on a Data-Parallel Many-core Architecture

Maryam Moazeni, Majid Sarrafzadeh, Alex A.T. Bui

2011 Symposium on Application Accelerators in High-Performance Computing > 52 - 55

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Dynamic Programming (DP) is a method for efficiently solving a broad range of search and optimization problems. As a result, techniques for managing large-scale DP problems are often critical to the performance of many applications. DP algorithms are often hard to parallelize. In this paper, we address the challenge of exploiting fine grain parallelism on a family of DP algorithms known as non-serial...

chapter

Design and Simulation of a Rectangular Meshotron Unit Prototype

Carlos Romeiro, Guilherme Campos, Arnaldo Oliveira

2011 Symposium on Application Accelerators in High-Performance Computing > 56 - 59

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

A novel application-specific hardware (ASH) unit was designed to form the building block of the Meshotron -- aparallelisation network for three-dimensional (3D) digital wave guide-mesh (DWM) room acoustic models. The rectangular mesh topology was elected. This ASH unit was tested using professional hardware simulation tools, assuming 32-bit integer data. Room impulse responses (RIR) were obtained...

chapter

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI Multi GPU Backends with Subdomain Support

Bjorn Meyer, Christian Plessl, Jens Forstner

2011 Symposium on Application Accelerators in High-Performance Computing > 60 - 63

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, programming languages,...

chapter

A First Analysis of a Dynamic Memory Allocation Controller (DMAC) Core

Yamuna Rajasekhar, Ron Sass

2011 Symposium on Application Accelerators in High-Performance Computing > 64 - 67

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Networking performance continues to grow but processor clock frequencies have not. Likewise, the latency to primary memory is not expected to improve dramatically either. This is leading computer architects to reconsider the networking subsystem and the roles and responsibilities of hardware and the operating system. This paper presents the first component of a new networking subsystem where the hardware...

chapter

Application of Graphics Processing Units (GPUs) to the Study of Non-linear Dynamics of the Exciton Bose-Einstein Condensate in a Semiconductor Quantum Well

Akila Gothandaraman, Seyedhamidreza Sadatian, Michal Faryniarz, Oleg L. Berman, more

2011 Symposium on Application Accelerators in High-Performance Computing > 68 - 71

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

In this paper, we explore the use of Graphics Processing Units (GPUs) to solve numerically the nonlinear Gross-Pitaevskii equation with an external potential. Our implementation uses NVIDIA's Compute Unified Device Architecture (CUDA) programming paradigm and demonstrates a speedup of 190x on an NVIDIA Tesla C2050 (Fermi) GPU compared to an optimized software implementation on a single-core of an...

chapter

Porting Optimized GPU Kernels to a Multi-core CPU: Computational Quantum Chemistry Application Example

Dong Ye, Alexey Titov, Volodymyr Kindratenko, Ivan Ufimtsev, more

2011 Symposium on Application Accelerators in High-Performance Computing > 72 - 75

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

We investigate techniques for optimizing a multi-core CPU code back ported from a highly optimized GPU kernel. We show that common sub-expression elimination and loop unrolling optimization techniques improve code performance on the GPU, but not on the CPU. On the other hand, register reuse and loop merging are effective on the CPU and in combination they improve performance of the ported code by...

Publication date

Set your own date range

Keywords

GRAPHICS PROCESSING UNIT (17)
KERNEL (11)
COMPUTATIONAL MODELING (7)
HARDWARE (7)
COMPUTER ARCHITECTURE (6)
FIELD PROGRAMMABLE GATE ARRAYS (6)
GPU (6)
INSTRUCTION SETS (5)
MATHEMATICAL MODEL (5)
GPGPU (4)
OPENCL (4)
REGISTERS (4)
MULTICORE (3)
MULTICORE PROCESSING (3)
OPTIMIZATION (3)
ACCELERATION (2)
APPROXIMATION METHODS (2)
ARRAYS (2)
BANDWIDTH (2)
CENTRAL PROCESSING UNIT (2)
CLOCKS (2)
DATA STRUCTURES (2)
ENGINES (2)
EQUATIONS (2)
FPGA (2)
FPGAS (2)
GRAPHICS PROCESSING UNITS (2)
ITERATIVE METHODS (2)
LATTICES (2)
MEMORY MANAGEMENT (2)
OPENMP (2)
PARALLEL PROCESSING (2)
PERFORMANCE EVALUATION (2)
WORKSTATIONS (2)
2D CORRELATION (1)
ACCELERATED PROCESSING UNIT (1)
ACCURACY (1)
ACOUSTIC MODELLING (1)
ACOUSTICS (1)
ADDERS (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ALGORITHMS (1)
AMD (1)
AMD FUSION (1)
APPLICATION ACCELERATION (1)
APU (1)
ASH (1)
BASEBAND (1)
BENCHMARK TESTING (1)
BENCHMARKING (1)
BOSE-EINSTEIN CONDENSATES (1)
BUFFER STORAGE (1)
CATALOGS (1)
CHARGE COUPLED DEVICES (1)
CHEMISTRY (1)
CHOLESKY (1)
CLIMATE MODEL (1)
CODE GENERATION (1)
COLLABORATION (1)
COMMON SUB-EXPRESSION ELIMINATION (1)
COMPILERS (1)
COMPUTATIONAL FLUID DYNAMICS (1)
COMPUTER NETWORKS (1)
CONVERGENCE (1)
COUPLED-CLUSTER (1)
CUDA (1)
DATA MODELS (1)
DATA PARTITIONING (1)
DIGITAL WAVEGUIDES (1)
DYNAMIC (1)
DYNAMIC PROGRAMMING (1)
DYNAMIC SCHEDULING (1)
EXASCALE COMPUTING (1)
EXCITONS (1)
EXCITONS IN SEMICONDUCTOR QUANTUM WELL (1)
FERMI (1)
FINITE DIFFERENCE METHODS (1)
FIRING (1)
FLOW ANALYSIS (1)
FPGA GPU CUDA PLACEMENT HPWL STAR+ (1)
GALS (1)
GCC (1)
GPU ARCHITECTURE COMPARISON (1)
GROSS-PITAEVSKII EQUATION (1)
HAARP (1)
HETEROGENEOUS ARCHITECTURES (1)
HETEROGENEOUS COMPUTING (1)
HEURISTIC ALGORITHMS (1)
HIGH PERFORMANCE (1)
HIGH PERFORMANCE COMPUTING (1)
HIGH-PERFORMANCE COMPUTING (1)
HIGH-SPEED NETWORKS (1)
IBM CELL B.E. (1)
IBM XLC (1)
ICC (1)
IMAGE PROCESSING (1)
IMPULSE RESPONSE (1)
INDEXES (1)
IONOSPHERE (1)
LATTICE QCD (1)
more

INFONA - science communication portal

2011 Symposium on Application Accelerators in High-Performance Computing

Cover Art

Title Page i

Title Page iii

Copyright Page

Conference Committees

Foreword

Table of Contents

Conference Sponsors

Real-Time Object Tracking System on FPGAs

Iterative Refinement on FPGAs

GPU-Accelerated Wire-Length Estimation for FPGA Placement

Accelerating a Climate Physics Model with OpenCL

Experience Applying Fortran GPU Compilers to Numerical Weather Prediction

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Non-serial Polyadic Dynamic Programming on a Data-Parallel Many-core Architecture

Design and Simulation of a Rectangular Meshotron Unit Prototype

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI Multi GPU Backends with Subdomain Support

A First Analysis of a Dynamic Memory Allocation Controller (DMAC) Core

Application of Graphics Processing Units (GPUs) to the Study of Non-linear Dynamics of the Exciton Bose-Einstein Condensate in a Semiconductor Quantum Well

Porting Optimized GPU Kernels to a Multi-core CPU: Computational Quantum Chemistry Application Example

Filter options

Publication date

Keywords

INFONA - science communication portal

2011 Symposium on Application Accelerators in High-Performance Computing $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2011 Symposium on Application Accelerators in High-Performance Computing