2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Items from 1 to 20 out of 34 results

chapter

Title Page i

2011 Symposium on Application Accelerators in High-Performance Computing > i

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

chapter

Implications of Memory-Efficiency on Sparse Matrix-Vector Multiplication

Shweta Jain, Robin Pottathuparambil, Ron Sass

2011 Symposium on Application Accelerators in High-Performance Computing > 80 - 83

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Sparse Matrix Vector-Multiplication is an important operation for many iterative solvers. However, peak performance is limited by the fact that the commonly used algorithm alternates between compute-bound and memory-bound steps. This paper proposes a novel data structure and an FPGA-based hardware core that eliminates the limitations imposed by memory.

chapter

Evaluation of GPU Architectures Using Spiking Neural Networks

Vivek K. Pallipuram, Mohammad A. Bhuiyan, Melissa C. Smith

2011 Symposium on Application Accelerators in High-Performance Computing > 93 - 102

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

During recent years General-Purpose Graphical Processing Units (GP-GPUs) have entered the field of High-Performance Computing (HPC) as one of the primary architectural focuses for many research groups working with complex scientific applications. Nvidia's Tesla C2050, codenamed Fermi, and AMD's Radeon 5870 are two devices positioned to meet the computationally demanding needs of supercomputing research...

chapter

Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation

Nicholas Moore, Miriam Leeser, Laurie Smith King

2011 Symposium on Application Accelerators in High-Performance Computing > 103 - 112

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a two-dimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering -- small...

chapter

GPU Performance Comparison for Accelerated Radar Data Processing

C.T. Fallen, B.V.C. Bellamy, G.B. Newby, B.J. Watkins

2011 Symposium on Application Accelerators in High-Performance Computing > 84 - 92

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Radar is a data-intensive measurement technique often requiring significant processing to make full use of the received signal. However, computing capacity is limited at remote or mobile radar installations thereby limiting radar data products used for real-time decisions. We used graphics processing units (GPUs) to accelerate processing of high resolution phase-coded radar data from the Modular UHF...

chapter

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Mayank Daga, Ashwin M. Aji, Wu-chun Feng

2011 Symposium on Application Accelerators in High-Performance Computing > 141 - 149

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers between the CPU and GPU over PCIe. Emerging heterogeneous computing architectures that "fuse" the functionality of the CPU and GPU, e.g., AMD...

chapter

Real-Time Object Tracking System on FPGAs

Su Liu, Alexandros Papakonstantinou, Hongjun Wang, Deming Chen

2011 Symposium on Application Accelerators in High-Performance Computing > 1 - 7

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Object tracking is an important task in computer vision applications. One of the crucial challenges is the real-time speed requirement. In this paper we implement an object tracking system in reconfigurable hardware using an efficient parallel architecture. In our implementation, we adopt a background subtraction based algorithm. The designed object tracker exploits hardware parallelism to achieve...

chapter

Conference Sponsors

2011 Symposium on Application Accelerators in High-Performance Computing > xi

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

chapter

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Bogdan Vacaliuc, Dilip R. Patlolla, Ed. D'Azevedo, Greg G. Davidson, more

2011 Symposium on Application Accelerators in High-Performance Computing > 159 - 167

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve...

chapter

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

Mitch Horton, Stanimire Tomov, Jack Dongarra

2011 Symposium on Application Accelerators in High-Performance Computing > 150 - 158

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Three out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dual-core processors. The large-scale enabling of hybrid...

chapter

Accelerating a Climate Physics Model with OpenCL

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou

2011 Symposium on Application Accelerators in High-Performance Computing > 24 - 33

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase performance even...

chapter

A First Analysis of a Dynamic Memory Allocation Controller (DMAC) Core

Yamuna Rajasekhar, Ron Sass

2011 Symposium on Application Accelerators in High-Performance Computing > 64 - 67

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Networking performance continues to grow but processor clock frequencies have not. Likewise, the latency to primary memory is not expected to improve dramatically either. This is leading computer architects to reconsider the networking subsystem and the roles and responsibilities of hardware and the operating system. This paper presents the first component of a new networking subsystem where the hardware...

chapter

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI Multi GPU Backends with Subdomain Support

Bjorn Meyer, Christian Plessl, Jens Forstner

2011 Symposium on Application Accelerators in High-Performance Computing > 60 - 63

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, programming languages,...

chapter

G-NetMon: A GPU-accelerated Network Performance Monitoring System

Wenji Wu, Phil DeMar, Don Holmgren, Amitoj Singh

2011 Symposium on Application Accelerators in High-Performance Computing > 76 - 79

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

At Fermilab, we have prototyped a GPU-accelerated network performance monitoring system, called G-NetMon, to support large-scale scientific collaborations. In this work, we explore new opportunities in network traffic monitoring and analysis with GPUs. Our system exploits the data parallelism that exists within network flow data to provide fast analysis of bulk data movement between Fermilab and collaboration...

chapter

Efficient Implementation of the Overlap Operator on Multi-GPUs

Andrei Alexandru, Michael Lujan, Craig Pelissier, Ben Gamari, more

2011 Symposium on Application Accelerators in High-Performance Computing > 123 - 130

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Lattice QCD calculations were one of the first applications to show the potential of GPUs in the area of high performance computing. Our interest is to find ways to effectively use GPUs for lattice calculations using the overlap operator. The large memory footprint of these codes requires the use of multiple GPUs in parallel. In this paper we show the methods we used to implement this operator efficiently...