Advanced search

Advanced search in people

From:

To:

Items from 1 to 20 out of 71 results

chapter

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 548 - 560

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network...

chapter

An Ultra Low-Power Hardware Accelerator for Acoustic Scoring in Speech Recognition

Hamid Tabani, Jose-Maria Arnau, Jordi Tubella, Antonio Gonzalez

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 41 - 52

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Accurate, real-time Automatic Speech Recognition (ASR) comes at a high energy cost, so accuracy has often to be sacrificed in order to fit the strict power constraints of mobile systems. However, accuracy is extremely important for the end-user, and today's systems are still unsatisfactory for many applications. The most critical component of an ASR system is the acoustic scoring, as it has a large...

chapter

A parallel method for extracting the flow path based on CUDA

Qianjiao Wu, Yumin Chen, Jiaxin Yang, Jingyi Zhang, more

2017 25th International Conference on Geoinformatics > 1 - 4

2017 25th International Conference on Geoinformatics

Flow path plays an important role in hydrological analysis modeling, especially in the dynamic simulation of surface flow discharge. The existing flow-path network model (FPN) can extract the flow path from random flow source point to the basin outlet and simplify the three-dimensional terrain surface to the one-dimensional representation. However, with the increasing of the number of flow source...

chapter

Supporting Energy-Efficient Computing on Heterogeneous CPU-GPU Architectures

Kyle Siehl, Xinghui Zhao

2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud) > 134 - 141

2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)

Modern high performance computing and cloud computing infrastructures often leverage Graphic Processing Units (GPUs) to provide accelerated, massively parallel computational power. This performance gain, however, may also introduce higher energy consumption. The energy challenge has become more and more pronounced when the system scales. To address this challenge, we propose Archon, a framework for...

chapter

Heterogeneous Hardware Support in BEAGLE, a High-Performance Computing Library for Statistical Phylogenetics

Daniel L. Ayres, Michael P. Cummings

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 23 - 32

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

We describe our approach to extend the BEAGLE library for high-performance statistical phylogenetic inference (maximum likelihood estimation and Bayesian analysis) in order to support a wider range of modern accelerators and multicore CPUs, and present the corresponding performance results from these platforms. Our solution includes a shared code design providing a uniform interface for a variety...

chapter

Towards a model transformation tool on the top of the OpenCL framework

Tamas Fekete, Gergely Mezei

2016 4th International Conference on Model-Driven Engineering and Software Development (MODELSWARD) > 355 - 360

2016 4th International Conference on Model-Driven Engineering and Software Development (MODELSWARD)

Nowadays, applications must often handle a large amount of data and apply complex algorithms on it. It is a promising and popular way to apply the computation in parallel in order to meet the performance requirements. Since GPUs are designed to apply highly parallel computations efficiently, using CPU+GPU heterogeneous architecture have gained an increasing popularity in computation intensive applications...

chapter

BrainGrid+Workbench: High-performance/high-quality neural simulation

Michael Stiber, Fumitaka Kawasaki, Delmar B. Davis, Hazeline U. Asuncion, more

2017 International Joint Conference on Neural Networks (IJCNN) > 2469 - 2476

2017 International Joint Conference on Neural Networks (IJCNN)

Availability of affordable hardware that in effect enables desktop supercomputing has enabled more ambitious neural simulations driven by more complex software. However, this opportunity comes with costs, in terms of long learning curves to take advantage of the performance possibilities of idiosyncratic, architecturally heterogenous hardware and decreasing ability to be confident in the quality of...

chapter

Exploring Translation of OpenMP to OpenACC 2.5: Lessons Learned

Sergio Pino, Lori Pollock, Sunita Chandrasekaran

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 673 - 682

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Scientists who want to exploit the computing power of the latest parallel architectures are faced with a diverse set of architectures and a number of programming languages, models and approaches. Among several such programming techniques are directive-based programming models, OpenMP and OpenACC. This paper explores the similarities and the functionality gaps between both models and presents insights...

chapter

Multi2Sim Kepler: A detailed architectural GPU simulator

Xun Gong, Rafael Ubal, David Kaeli

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 269 - 278

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Presilicon simulation is one of the key toolsets for computer architects to evaluate and optimize their future designs. As Graphics Processing Units (GPUs) have become the platform of choice in many computing communities due to their impressive processing capabilities, computer architecture researchers need a simulation framework that allows them to quantitatively consider design tradeoffs. In this...

chapter

A study of recent contributions on performance and simulation techniques for accelerator devices

Hussein Ajam, Michael Opoku Agyeman

2017 4th International Conference on Electrical and Electronic Engineering (ICEEE) > 256 - 260

2017 4th International Conference on Electrical and Electronic Engineering (ICEEE)

High performance computing platform is moving from homogeneous individual unites to heterogeneous systems. Where each unit is a combination of homogeneous cores and accelerator devices. Accelerator s uch as GPUs, FPGAs, DSPs, these devices usually designed for the specific and intensive type of computing tasks. The presence of these devices have created fresh and attractive development platforms for...

chapter

TwinKernels: An execution model to improve GPU hardware scheduling at compile time

Xiang Gong, Zhongliang Chen, Amir Kavyan Ziabari, Rafael Ubal, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 39 - 49

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

As throughput-oriented accelerators, GPUs provide tremendous processing power by running a massive number of threads in parallel. However, exploiting high degrees of thread-level parallelism (TLP) does not always translate to the peak performance that GPUs can offer, leaving the GPU's resources often under-utilized. Compared to compute resources, memory resources can tolerate considerably lower levels...

chapter

LogCA: A high-level performance model for hardware accelerators

Muhammad Shoaib Bin Altaf, David A. Wood

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 375 - 388

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

With the end of Dennard scaling, architects have increasingly turned to special-purpose hardware accelerators to improve the performance and energy efficiency for some applications. Unfortunately, accelerators don't always live up to their expectations and may under-perform in some situations. Understanding the factors which effect the performance of an accelerator is crucial for both architects and...

chapter

A Modeling Approach to Hardware Analysis of the Heterogeneous DEAC Cluster

Riana J. Freedman, Damian Valles

2016 International Conference on Computational Science and Computational Intelligence (CSCI) > 1408 - 1409

2016 International Conference on Computational Science and Computational Intelligence (CSCI)

The employment of five distinct benchmarks on the Distributed Environment for Academic Computing (DEAC) Cluster at Wake Forest University provides meaningful metrics of cluster processor and memory performance. Given the heterogeneous nature of the DEAC Cluster, the benchmarks taken consider the specific processor architectures comprising the cluster. The data obtained will be assessed via two modeling...

chapter

An implementation of analytical power model on integrated GPU

Ning Li, Li Shen, Qi Zhu, Yemao Xu, more

2016 International Symposium on Integrated Circuits (ISIC) > 1 - 4

2016 International Symposium on Integrated Circuits (ISIC)

GPU has become an important component of the high performance computing system and its principal duty is parallel computing rather than graphical display. Determining the power and energy consumption is necessary to the scaling of GPU. This paper presents a statistic model to evaluate the power and energy consumption of AMD's integrated GPU (iGPU). By collecting the data of performance counters from...

chapter

Understanding Error Propagation in GPGPU Applications

Guanpeng Li, Karthik Pattabiraman, Chen-Yang Cher, Pradip Bose

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 240 - 251

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications have not been investigated in depth. While error propagation has been extensively investigated for non-GPU applications, GPU applications have a very different programming model which can have a significant effect on error propagation...

chapter

dCUDA: Hardware Supported Overlap of Computation and Communication

Tobias Gysi, Jeremia Bar, Torsten Hoefler

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 609 - 620

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side...

chapter

Unified and lightweight tasks and conduits: A high level parallel programming framework

Chao Liu, Miriam Leeser

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Computing platforms for high performance and parallel applications have changed rapidly during the past few years, from single to multiple cores, and from traditional Central Processing Units (CPUs) to hybrid systems which combine CPUs with accelerators such as Graphics Processing Units(GPUs), Intel Xeon Phi, etc. These developments bring more and more challenges to application developers, especially...

chapter

Heterogeneous computing with accelerators: an overview with examples

Ana Lucia Varbanescu, Jie Shen

2016 Forum on Specification and Design Languages (FDL) > 1 - 8

2016 Forum on Specification and Design Languages (FDL)

Accelerator-based platforms are heterogeneous in nature, yet most applications avoid heterogeneity, and focus on acceleration alone. Platform-level heterogeneity can bring significant performance improvement, as it essentially means using additional resources for the same computation. But is the performance gained using these additional resources worth the effort to program and deploy heterogeneous...

article

Accelerated Deformable Part Models on GPUs

Manato Hirabayashi, Shinpei Kato, Masato Edahiro, Kazuya Takeda, more

IEEE Transactions on Parallel and Distributed Systems > 2016 > 27 > 6 > 1589 - 1602

Object detection is a fundamental challenge facing intelligent applications. Image processing is a promising approach to this end, but its computational cost is often a significant problem. This paper presents schemes for accelerating the deformable part models (DPM) on graphics processing units (GPUs). DPM is a well-known algorithm for image-based object detection, and it achieves high detection...

chapter

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Erik Zenker, Benjamin Worpitz, Rene Widera, Axel Huebl, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 631 - 640

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits...

Keywords:
COMPUTATIONAL MODELING
HARDWARE
GRAPHICS PROCESSING UNITS

Publication date

Set your own date range

Publication type

book (65)
article (6)

Keywords

KERNEL (28)
COMPUTER ARCHITECTURE (22)
GPU (17)
INSTRUCTION SETS (16)
PARALLEL PROCESSING (12)
PROGRAMMING (11)
GPGPU (9)
ACCELERATION (7)
CUDA (6)
MATHEMATICAL MODEL (6)
PERFORMANCE (6)
PERFORMANCE EVALUATION (6)
PIPELINES (6)
CENTRAL PROCESSING UNIT (5)
DATA MODELS (4)
LIBRARIES (4)
OPENCL (4)
RESOURCE MANAGEMENT (4)
RUNTIME (4)
SYNCHRONIZATION (4)
ACCELERATORS (3)
ADAPTATION MODELS (3)
ALGORITHM DESIGN AND ANALYSIS (3)
BENCHMARK TESTING (3)
COMPUTER GRAPHICS (3)
COPROCESSORS (3)
ENERGY EFFICIENCY (3)
GPUS (3)
GRAPHICS (3)
MEMORY MANAGEMENT (3)
OPTIMIZATION (3)
PERFORMANCE MODELING (3)
PREDICTIVE MODELS (3)
PROGRAM PROCESSORS (3)
SOLID MODELING (3)
VISUALIZATION (3)
WORKLOAD PARTITIONING (3)
ANALYTICAL MODELS (2)
ARRAYS (2)
CLOCKS (2)
CLOUD COMPUTING (2)
COMPLEXITY THEORY (2)
COMPUTER GRAPHIC EQUIPMENT (2)
COMPUTERS (2)
DATA STRUCTURES (2)
FPGA (2)
GRAPHICS PROCESSING UNIT (2)
HETEROGENEOUS ARCHITECTURES (2)
HETEROGENEOUS COMPUTING (2)
HETEROGENEOUS PLATFORMS (2)
IMAGE PROCESSING (2)
MICROBENCHMARKS (2)
MULTICORE PROCESSING (2)
OPENMP (2)
PARALLEL COMPUTING (2)
PARTITIONING ALGORITHMS (2)
RANDOM ACCESS MEMORY (2)
RENDERING (COMPUTER GRAPHICS) (2)
SCHEDULING (2)
STANDARDS (2)
THREE-DIMENSIONAL DISPLAYS (2)
TOOLS (2)
ACCELERATOR-BASED COMPUTING (1)
ACOUSTIC SCORING (1)
ACOUSTICS (1)
AEROSPACE ELECTRONICS (1)
AGGREGATE RISK ANALYSIS (1)
AGGREGATES (1)
ALGORITHMS (1)
ALLOCATION (1)
ANALYTICAL MODELING (1)
API INTERCEPTION (1)
APPLICABILITY (1)
APPLICATION CLASSIFICATION (1)
ATMOSPHERIC MODELING (1)
AUTOMATIC SPEECH RECOGNITION (1)
BARRA (1)
BAYES METHODS (1)
BENCHMARK (1)
BIG DATA (1)
BIOLOGICAL SYSTEM MODELING (1)
BIOLOGY COMPUTING (1)
BIOMEDICAL IMAGING (1)
BRAIN MODELING (1)
BREADTH-FIRST SEARCH (1)
C SOURCE PROGRAMS (1)
C++ (1)
CACHE (1)
CACHE COHERENCE (1)
CARBON NANOTUBES (1)
CELLULAR NEURAL NETS (1)
CIRCUIT FAULTS (1)
CLUSTER (1)
CLUSTERING ALGORITHMS (1)
CNN ALGORITHM DEVELOPMENT ENVIRONMENT (1)
COHERENCE (1)
COMPONENT-BASED DEVELOPMENT (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

An Ultra Low-Power Hardware Accelerator for Acoustic Scoring in Speech Recognition

A parallel method for extracting the flow path based on CUDA

Supporting Energy-Efficient Computing on Heterogeneous CPU-GPU Architectures

Heterogeneous Hardware Support in BEAGLE, a High-Performance Computing Library for Statistical Phylogenetics

Towards a model transformation tool on the top of the OpenCL framework

BrainGrid+Workbench: High-performance/high-quality neural simulation

Exploring Translation of OpenMP to OpenACC 2.5: Lessons Learned

Multi2Sim Kepler: A detailed architectural GPU simulator

A study of recent contributions on performance and simulation techniques for accelerator devices

TwinKernels: An execution model to improve GPU hardware scheduling at compile time

LogCA: A high-level performance model for hardware accelerators

A Modeling Approach to Hardware Analysis of the Heterogeneous DEAC Cluster

An implementation of analytical power model on integrated GPU

Understanding Error Propagation in GPGPU Applications

dCUDA: Hardware Supported Overlap of Computation and Communication

Unified and lightweight tasks and conduits: A high level parallel programming framework

Heterogeneous computing with accelerators: an overview with examples

Accelerated Deformable Part Models on GPUs

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options