Search results for: Wu-chun Feng

Items from 1 to 12 out of 12 results

chapter

A framework for fast and fair evaluation of automata processing hardware

Xiaodong Yu, Kaixi Hou, Hao Wang, Wu-chun Feng

2017 IEEE International Symposium on Workload Characterization (IISWC) > 120 - 121

2017 IEEE International Symposium on Workload Characterization (IISWC)

Programming Micron's Automata Processor (AP) requires expertise in both automata theory and the AP architecture, as programmers have to manually manipulate state transition elements (STEs) and their transitions with a low-level Automata Network Markup Language (ANML). When the required STEs of an application exceed the hardware capacity, multiple reconfigurations are needed. However, most previous...

chapter

AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems

Ahmed E. Helal, Wu-chun Feng, Changhee Jung, Yasser Y. Hanafy

2017 IEEE International Symposium on Workload Characterization (IISWC) > 32 - 42

2017 IEEE International Symposium on Workload Characterization (IISWC)

Porting sequential applications to heterogeneous HPC systems requires extensive software and hardware expertise to estimate the potential speedup and to efficiently use the available compute resources in such systems. To streamline this daunting process, researchers have proposed several “black-box” performance prediction approaches that rely on the performance of a training set of parallel applications...

chapter

Developing dynamic profiling and debugging support in OpenCL for FPGAs

Anshuman Verma, Huiyang Zhou, Skip Booth, Robbie King, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

With FPGAs emerging as a promising accelerator for general-purpose computing, there is a strong demand to make them accessible to software developers. Recent advances in OpenCL compilers for FPGAs pave the way for synthesizing FPGA hardware from OpenCL kernel code. To enable broader adoption of this paradigm, significant challenges remain. This paper presents our efforts in developing dynamic profiling...

chapter

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

Ahmed E. Helal, Virginia Tech, Paul Sathre, Wu-chun Feng

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 119 - 129

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within...

chapter

Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels

Islam Harb, Wu-Chun Feng

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) > 451 - 456

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)

There is a lack of support for explicit synchronization in GPUs between the streaming multiprocessors (SMs) adversely impacts the performance of the GPUs to efficiently perform inter-block communication. In this paper, we present several approaches to inter-block synchronization using explicit/implicit CPU-based and dynamic parallelism (DP) mechanisms. Although this topic has been addressed in previous...

chapter

An automated framework for characterizing and subsetting GPGPU workloads

Vignesh Adhinarayanan, Wu-chun Feng

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 307 - 317

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Graphics processing units (GPUs) are becoming increasingly common in today's computing systems due to their superior performance and energy efficiency relative to their cost. To further improve these desired characteristics, researchers have proposed several software and hardware techniques. Evaluation of these proposed techniques could be tricky due to the ad-hoc nature in which applications are...

chapter

On the performance and energy efficiency of FPGAs and GPUs for polyphase channelization

Vignesh Adhinarayanan, Thaddeus Koehn, Krzysztof Kepa, Wu-chun Feng, more

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) > 1 - 7

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Wideband channelization is an important and computationally demanding task in the front-end subsystem of several software-defined radios (SDRs). The hardware that supports this task should provide high performance, consume low power, and allow flexible implementations. Several classes of devices have been explored in the past, with the FPGA proving to be the most popular as it reasonably satisfies...

chapter

Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study

Kaixi Hou, Hao Wang, Wu-chun Feng

2014 43rd International Conference on Parallel Processing Workshops > 273 - 282

2014 43nd International Conference on Parallel Processing Workshops (ICCPW)

Moore's Law effectively doubles the compute power of a microprocessor every 24 months. Over the past decade, however, this doubling in performance has been due to the doubling of the number of cores in a microprocessor rather than clock speed increases. Perhaps nowhere is this more evident than with the Intel Xeon Phi coprocessor. This many core architecture exhibits not only massive inter-core parallelism...

chapter

Runtime Adaptation for Autonomic Heterogeneous Computing

Thomas R.W. Scogland, Wu-Chun Feng

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 562 - 565

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Heterogeneity is increasing at all levels of computing, certainly with the rise in general purpose computing with GPUs in everything from phones to supercomputers. More quietly it is increasing with the rise of NUMA systems, hierarchical caching, OS noise, and a myriad of other factors. As heterogeneity becomes a fact of life at every level of computing, efficiently managing heterogeneous compute...

chapter

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

Konstantinos Krommydas, Muhsen Owaida, Christos D. Antonopoulos, Nikolaos Bellas, more

2013 International Conference on Parallel and Distributed Systems > 432 - 433

2013 International Conference on Parallel and Distributed Systems (ICPADS)

The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and industry-supported programming model that offers code portability on heterogeneous platforms, allowing applications to be developed once and deployed...

chapter

Online Performance Projection for Clusters with Heterogeneous GPUs

Lokendra S. Panwar, Ashwin M. Aji, Jiayuan Meng, Pavan Balaji, more

2013 International Conference on Parallel and Distributed Systems > 283 - 290

2013 International Conference on Parallel and Distributed Systems (ICPADS)

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases...

chapter

GePSeA: A General-Purpose Software Acceleration Framework for Lightweight Task Offloading

A. Singh, P. Balaji, Wu-chun Feng

2009 International Conference on Parallel Processing > 261 - 268

2009 International Conference on Parallel Processing (ICPP 2009)

Hardware-acceleration techniques continue to be used to speed-up the execution of scientific codes. To do so, software developers identify portions of these codes that are amenable for offloading and map them to hardware accelerators. However, offloading such tasks to specialized hardware accelerators is non-trivial. Furthermore, these accelerators can add significant cost to a computing system. Consequently,...

Filter options

Keywords:
HARDWARE

Publication date

Set your own date range

Keywords

GRAPHICS PROCESSING UNITS (6)
GPU (5)
KERNEL (5)
PARALLEL PROCESSING (5)
PERFORMANCE EVALUATION (4)
COMPUTER ARCHITECTURE (3)
FIELD PROGRAMMABLE GATE ARRAYS (3)
OPENCL (3)
PROGRAMMING (3)
ACCELERATION (2)
FPGA (2)
HARDWARE DESIGN LANGUAGES (2)
INSTRUCTION SETS (2)
MIC (2)
OPENMP (2)
PROGRAMMABILITY (2)
RUNTIME (2)
SYNCHRONIZATION (2)
ACCELERATOR-AWARE MPI (1)
ACCELERATORS (1)
ANALYTICAL MODELS (1)
APPLICATION SPECIFIC TASKS (1)
APU (1)
AUTOMATA (1)
BANDWIDTH (1)
BENCHMARK TESTING (1)
CLOCKS (1)
CODE PATTERNS (1)
COMPUTATIONAL MODELING (1)
COMPUTING SYSTEM (1)
COPROCESSORS (1)
CPU (1)
CPU SYNCHRONIZATION (1)
CUDA (1)
DEBUGGING (1)
DISTRIBUTED DATABASES (1)
DWARFS (1)
DYNAMIC PARALLELISM (1)
ENERGY EFFICIENCY (1)
ENGINES (1)
EXASCALE (1)
FINITE IMPULSE RESPONSE FILTERS (1)
FLOYD-WARSHALL (1)
FRAMEWORK (1)
GENERAL PURPOSE SOFTWARE ACCELERATION FRAMEWORK (1)
GRAPH (1)
HARDWARE ACCELERATION TECHNIQUES (1)
INTEL XEON PHI (1)
INTEROPERABILITY (1)
LIBRARIES (1)
LIGHTWEIGHT TASK OFFLOADING (1)
MAGNETIC CORES (1)
MANYCORE (1)
MEASUREMENT (1)
MEMORY MANAGEMENT (1)
MICROWAVE INTEGRATED CIRCUITS (1)
MPI (1)
MPIBLAST (1)
MULTICORE ARCHITECTURES (1)
MULTICORE PROCESSING (1)
OPEN SOURCE COMPUTATIONAL BIOLOGY APPLICATION (1)
OPTIMIZATION (1)
PARALLEL ARCHITECTURES (1)
PARALLEL LIBRARIES (1)
PATTERN MATCHING (1)
PERFORMANCE MODELING (1)
PERFORMANCE PORTABILITY (1)
PERFORMANCE PROJECTION (1)
PORTABILITY (1)
POWER DEMAND (1)
PREDICTION ALGORITHMS (1)
PRINCIPAL COMPONENT ANALYSIS (1)
PROFILING (1)
PUBLIC DOMAIN SOFTWARE (1)
RADIATION DETECTORS (1)
REDUNDANCY (1)
SCHEDULES (1)
SCHEDULING (1)
SCIENTIFIC CODES EXECUTION (1)
SOFTWARE (1)
SOFTWARE ACCELERATION (1)
SOFTWARE DEVELOPERS (1)
SOFTWARE PROCESS IMPROVEMENT (1)
STRUCTURED GRIDS (1)
THROUGHPUT (1)
TOOLS (1)
XEON PHI (1)
more

INFONA - science communication portal

Search results for: Wu-chun Feng

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options