Search results

chapter

A Memory Accessing Method for the Parallel Aho-Corasick Algorithm on GPU

JinMyung Yoon, Kang-Il Choi, HyunJin Kim

2016 International Conference on Information Science and Security (ICISS) > 1 - 3

2016 International Conference on Information Science and Security (ICISS)

In this paper, we propose a memory accessing method of Parallel Failureless Aho-Corasick (PFAC) algorithm considering Graphic Processing Unit (GPU) memory architecture for throughput improvement. Compared with Aho-Corasick (AC) Algorithm using Central Processing Unit (CPU) and Data-Parallel Aho-Corasick (DPAC) using Open Multi-Processing (OpenMP), PFAC using GPU achieves high performance advancement...

chapter

Software defined radio implementation of adaptive nonlinear digital self-interference cancellation for mobile inband full-duplex radio

Mona AghababaeeTafreshi, Matias Koskela, Dani Korpi, Pekka Jaaskelainen, more

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 733 - 737

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Inband full-duplex radio transceivers offer enhanced spectral efficiency by transmitting and receiving simultaneously at the same frequency. However, deployment of such systems is challenging due to the inherent self-interference stemming from coupling of the transmit signal to the receiver. Furthermore, to track changes in the time-varying self-interference channel, the process needs to be self-adaptive...

chapter

Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks

Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, more

2016 International Conference on Field-Programmable Technology (FPT) > 265 - 268

2016 International Conference on Field-Programmable Technology (FPT)

Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs showing significant improvements in their classification and training times. With these improvements, many frameworks have become available for implementing CNNs...

chapter

Realizing Real-Time Deep Learning-Based Super-Resolution Applications on Integrated GPUs

Sung Ye Kim, Preeti Bindu

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) > 693 - 696

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

With recent advances in deep convolutional neural networks (CNN), deep learning has brought significant quality improvement and flexibility on single image super resolution (SR). In this paper, we describe how CNN based SR can be accelerated on integrated GPUs. To this end, we employ a CNN model from an existing single image SR approach, and develop the model within a well-known deep learning framework...

chapter

Fast Race Detection and Profiling Framework for Heterogeneous System

Cheng-Kung Lai, Chih-Wei Yeh, Shih-Hao Hung

2016 International Computer Symposium (ICS) > 525 - 530

2016 International Computer Symposium (ICS)

Heterogeneous computing is a growing trend in recent computer architecture design and is often used to improve the performance and power efficiency for computing applications by utilizing the special-purpose processors or accelerators, such as the Graphic Computing Unit (GPU), Field Programmable Gate Array (FPGA) and Digital Signal Processor (DSP). With the increase of complexity, the interaction...

chapter

An implementation of analytical power model on integrated GPU

Ning Li, Li Shen, Qi Zhu, Yemao Xu, more

2016 International Symposium on Integrated Circuits (ISIC) > 1 - 4

2016 International Symposium on Integrated Circuits (ISIC)

GPU has become an important component of the high performance computing system and its principal duty is parallel computing rather than graphical display. Determining the power and energy consumption is necessary to the scaling of GPU. This paper presents a statistic model to evaluate the power and energy consumption of AMD's integrated GPU (iGPU). By collecting the data of performance counters from...

chapter

Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters

Dip Sankar Banerjee, Khaled Hamidouche, Dhabaleswar K. Panda

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) > 144 - 151

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

Deep learning frameworks have recently gained widespread popularity due to their highly accurate prediction capabilities and availability of low cost processors that can perform training over a large dataset quickly. Given the high core count in modern generation high performance computing systems, training deep networks over large data has now become practical. In this work, while targeting the Computational...

chapter

Optimized GPU Acceleration Algorithm of Convolutional Neural Networks for Target Detection

Shijie Li, Yong Dou, Qi Lv, Qiang Wang, more

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 224 - 230

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Target detection is a hard real-time task for video and image processing. This task has recently been accomplished through the feedforward process of convolutional neural net-works (CNN), which is usually accelerated by general-purpose graphic units (GPUs). However, there is a challenge for this task. The running speed remains to be improved. In this paper, we present an efficient image combination...

chapter

WAP: The Warp Feature Aware Prefetching Method for LLC on CPU-GPU Heterogeneous Architecture

Minghui Wu, Yulong Pei, Licheng Yu, Tianzhou Chen, more

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 414 - 421

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Recently, researchers discovered a GPU has some advantages for non-graphic computing. CPU-GPU heterogeneous architecture combines CPU and GPU to a chip and makes GPU easier to run non-graphic programs. Researchers also proposed LLC(last-level cache) to store and exchange data between CPU and GPU. We discover the LLC hit rate has great influence on memory access performance and system's performance...

chapter

A reuse distance based performance analysis on GPU L1 data cache

Dongwei Wang, Weijun Xiao

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) > 1 - 8

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)

Generally, cache is a bridge between CPU and main memory in order to narrow the gap of performance. As a throughput-oriented device, Graphics Processing Unit(GPU) has already integrated with cache, which is similar to CPU cores in order to exploit the locality of memory accesses. However, the applications in GPGPU computing exhibit distinct memory access patterns compared to the multi-core counterparts...

chapter

Scalable multiple GPU architecture for super multi-view synthesis using MVD

Byoungkyun Kim, Byeongho Choi, Youngbae Hwang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents a scalable multiple GPU architecture for super multi-view (SMV) synthesis using the multi-view video plus depth (MVD) data. SMV synthesis is essential to generate 3D contents for the SMV 3D display with hundred views. SMV 3D display, recently released to support 108 viewpoints, shows the multiplexed result of small viewing interval. Hence, we should synthesize the intermediate...

chapter

Zero and data reuse-aware fast convolution for deep neural networks on GPU

Hyunsun Park, Dongyoung Kim, Junwhan Ahn, Sungjoo Yoo

2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 1 - 10

2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

Convolution operations dominate the total execution time of deep convolutional neural networks (CNNs). In this paper, we aim at enhancing the performance of the state-of-the-art convolution algorithm (called Winograd convolution) on the GPU. Our work is based on two observations: (1) CNNs often have abundant zero weights and (2) the performance benefit of Winograd convolution is limited mainly due...

chapter

Scheduling challenges and opportunities in integrated CPU+GPU processors

Kapil Dev, Sherief Reda

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia) > 1 - 6

2016 ACM/IEEE 14th Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping...

chapter

An application of the supervoxel-based Fuzzy C-Means with a GPU support to segmentation of volumetric brain images

Anna Fabijanska, Jaroslaw Goclawski

2016 Federated Conference on Computer Science and Information Systems (FedCSIS) > 777 - 785

2016 Federated Conference on Computer Science and Information Systems (FedCSIS)

In this paper the problem of segmentation of vol- umetric medical images is considered. The fast and effective segmentation is obtained by applying the proposed approach which combines the idea of supervoxels and the Fuzzy C-Means algorithm. In particular, Fuzzy C-Means is used to cluster supervoxels produced by the fast 3D region growing. Additional acceleration of the method is achieved with the...

chapter

Histogram optimization with CUDA

Keh Kok Yong, Sheera Shaheera Othman Talib

2016 IEEE Industrial Electronics and Applications Conference (IEACon) > 312 - 318

2016 IEEE Industrial Electronics and Applications Conference (IEACon)

Histogram is a popular analytic graphical representation of data distribution resulting from processing a given numerical input data. Although the sequential histogram computation may be simple, it is no longer suitable in processing high volume of data. With recent advancement of high performance computing (HPC), aided by the accelerating growth of General Purpose Graphic Processing Unit (GPGPU),...

chapter

GPU implementation of multi-scale Retinex image enhancement algorithm

Hui Li, Weihao Xie, Xingang Wang, Shousheng Liu, more

2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA) > 1 - 5

2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)

Multi-scale Retinex algorithm is an image enhancement algorithm that aims at image reconstruction. The algorithm maintains the high fidelity and the dynamic range compression of the image, so the enhancement effect is obvious. The algorithm exploits a large number of convolution operations to achieve dynamic range compression and color/brightness rendition, and the calculation time increased significantly...

chapter

The Vectorization of the Tersoff Multi-body Potential: An Exercise in Performance Portability

Markus Hohnerbach, Ahmed E. Ismail, Ahmed E. Ismail

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 69 - 81

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Molecular dynamics simulations, an indispensable research tool in computational chemistry and materials science, consume a significant portion of the supercomputing cycles around the world. We focus on multi-body potentials and aim at achieving performance portability. Compared with well-studied pair potentials, multibody potentials deliver increased simulation accuracy but are too complex for effective...

chapter

Extended Task Queuing: Active Messages for Heterogeneous Systems

Michael LeBeane, Brandon Potter, Abhisek Pan, Alexandru Dutu, more

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 933 - 944

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Accelerators have emerged as an important component of modern cloud, datacenter, and HPC computing environments. However, launching tasks on remote accelerators across a network remains unwieldy, forcing programmers to send data in large chunks to amortize the transfer and launch overhead. By combining advances in intra-node accelerator unification with one-sided Remote Direct Memory Access (RDMA)...

chapter

ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs

Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 854 - 865

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation and object detection and localization. Here we consider the parallelization of inference, i.e., the application of a previously trained ConvNet, with emphasis on 3D images. Our goal is to maximize throughput, defined as the number of output voxels computed per unit...

chapter

Understanding Error Propagation in GPGPU Applications

Guanpeng Li, Karthik Pattabiraman, Chen-Yang Cher, Pradip Bose

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 240 - 251

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications have not been investigated in depth. While error propagation has been extensively investigated for non-GPU applications, GPU applications have a very different programming model which can have a significant effect on error propagation...

INFONA - science communication portal

Search results

A Memory Accessing Method for the Parallel Aho-Corasick Algorithm on GPU

Software defined radio implementation of adaptive nonlinear digital self-interference cancellation for mobile inband full-duplex radio

Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks

Realizing Real-Time Deep Learning-Based Super-Resolution Applications on Integrated GPUs

Fast Race Detection and Profiling Framework for Heterogeneous System

An implementation of analytical power model on integrated GPU

Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters

Optimized GPU Acceleration Algorithm of Convolutional Neural Networks for Target Detection

WAP: The Warp Feature Aware Prefetching Method for LLC on CPU-GPU Heterogeneous Architecture

A reuse distance based performance analysis on GPU L1 data cache

Scalable multiple GPU architecture for super multi-view synthesis using MVD

Zero and data reuse-aware fast convolution for deep neural networks on GPU

Scheduling challenges and opportunities in integrated CPU+GPU processors

An application of the supervoxel-based Fuzzy C-Means with a GPU support to segmentation of volumetric brain images

Histogram optimization with CUDA

GPU implementation of multi-scale Retinex image enhancement algorithm

The Vectorization of the Tersoff Multi-body Potential: An Exercise in Performance Portability

Extended Task Queuing: Active Messages for Heterogeneous Systems

ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs

Understanding Error Propagation in GPGPU Applications

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options