Search results for: Wu-chun Feng

Items from 1 to 13 out of 13 results

chapter

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

Ahmed E. Helal, Virginia Tech, Paul Sathre, Wu-chun Feng

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 119 - 129

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within...

chapter

Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems

James E. McClure, Hao Wang, Jan F. Prins, Cass T. Miller, more

2014 IEEE 28th International Parallel and Distributed Processing Symposium > 583 - 592

2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

Large-scale simulation can provide a wide range of information needed to develop and validate theoretical models for multiphase flow in porous medium systems. In this paper, we consider a coupled solution in which a multiphase flow simulator is coupled to an analysis approach used to extract the interfacial geometries as the flow evolves. This has been implemented using MPI to target heterogeneous...

chapter

On the Programmability and Performance of Heterogeneous Platforms

Konstantinos Krommydas, Thomas R.W. Scogland, Wu-Chun Feng

2013 International Conference on Parallel and Distributed Systems > 224 - 231

2013 International Conference on Parallel and Distributed Systems (ICPADS)

General-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization. The growing diversity of parallel architectures presents many challenges to the domain scientist, including device selection, programming model, and level of investment in optimization. All of these...

article

Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator

Mark Gardner, Paul Sathre, Wu-chun Feng, Gabriel Martinez

Parallel Computing > 2013 > 39 > 12 > 769-786

The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and their associated programming models. One of the most promising models for heterogeneous computing is the accelerator model, and one of the most cost-effective, high-performance accelerators currently available is the general-purpose, graphics processing unit (GPU).Two similar programming...

chapter

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming

Ashwin M. Aji, Pavan Balaji, James Dinan, Wu-chun Feng, more

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 1020 - 1029

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Despite the vast interest in accelerator-based systems, programming large multinode GPUs is still a complex task, particularly with respect to optimal data movement across the host-GPU PCIe connection and then across the network. In order to address such issues, GPU-integrated MPI solutions have been developed that integrate GPU data movement into existing MPI implementations. Currently available...

chapter

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation

Paul Sathre, Mark Gardner, Wu-Chun Feng

2012 41st International Conference on Parallel Processing Workshops > 89 - 96

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

The use of accelerators in high-performance computing is increasing. The most commonly used accelerator is the graphics processing unit (GPU) because of its low cost and massively parallel performance. The two most common programming environments for GPU accelerators are CUDA and OpenCL. While CUDA runs natively only on NVIDIA GPUs, OpenCL is an open standard that can run on a variety of hardware...

chapter

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems

Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, more

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 647 - 654

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement frameworks,...

chapter

Efficient Intranode Communication in GPU-Accelerated Systems

Feng Ji, Ashwin M. Aji, James Dinan, Darius Buntinas, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1838 - 1847

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques to significantly...

chapter

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Mayank Daga, Thomas Scogland, Wu-chun Feng

2011 IEEE 17th International Conference on Parallel and Distributed Systems > 316 - 323

2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-dimensional problem that requires deep technical knowledge of GPU architecture. Although substantial literature exists on how to map and optimize GPU performance...

chapter

CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures

Gabriel Martinez, Mark Gardner, Wu-chun Feng

2011 IEEE 17th International Conference on Parallel and Distributed Systems > 300 - 307

2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)

The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks...

chapter

High-performance biocomputing for simulating the spread of contagion over large contact networks

K R Bisset, A M Aji, M V Marathe, Wu-chun Feng

2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) > 26 - 32

2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)

Many important biological problems can be modeled as contagion diffusion processes over interaction networks. This paper shows how the EpiSimdemics interaction-based simulation system can be applied to the general contagion diffusion problem. Two specific problems, computational epidemiology and human immune system modeling, are given as examples. We then show how the graphics processing unit (GPU)...

chapter

GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors

A M Aji, Liqing Zhang, Wu-chun Feng

2010 13th IEEE International Conference on Computational Science and Engineering > 168 - 175

2010 IEEE 13th International Conference on Computational Science and Engineering (CSE 2010)

Next-generation, high-throughput sequencers are now capable of producing hundreds of billions of short sequences (reads) in a single day. The task of accurately mapping the reads back to a reference genome is of particular importance because it is used in several other biological applications, e.g., genome re-sequencing, DNA methylation, and ChiP sequencing. On a personal computer (PC), the computationally...

chapter

To GPU synchronize or not GPU synchronize?

Wu-chun Feng, Shucai Xiao

Proceedings of 2010 IEEE International Symposium on Circuits and Systems > 3801 - 3804

2010 IEEE International Symposium on Circuits and Systems. ISCAS 2010

The graphics processing unit (GPU) has evolved from being a fixed-function processor with programmable stages into a programmable processor with many fixed-function components that deliver massive parallelism. By modifying the GPU's stream processor to support “general-purpose computation” on the GPU (GPGPU), applications that perform massive vector operations can realize many orders-of-magnitude...

Filter options

Keywords:
CUDA

Publication date

Set your own date range

INFONA - science communication portal

Search results for: Wu-chun Feng

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options