Search results

article

A Reconfigurable and Scalable FPGA Architecture for Bilateral Filtering

Swapnil Deelip Dabhade, G. N. Rathna, Kunal Narayan Chaudhury

IEEE Transactions on Industrial Electronics > 2018 > 65 > 2 > 1459 - 1469

Bilateral filter is an edge-preserving smoother that has applications in image processing, computer vision, and computational photography. In the past, field-programmable gate array (FPGA) implementations of the filter have been proposed that can achieve high throughput using parallelization and pipelining. An inherent limitation with direct implementations is that their complexity scales as

$O(\omega ^2)$

...

chapter

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 13 - 24

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Multi-/many-core CPU based architectures are seeing widespread adoption due to their unprecedented compute performance in a small power envelope. With the increasingly large number of cores on each node, applications spend a significant portion of their execution time in intra-node communication. While shared memory is commonly used for intra-node communication, it needs to copy each message once...

chapter

Fast linear algebra-based triangle counting with KokkosKernels

Michael M. Wolf, Mehmet Deveci, Jonathan W. Berry, Simon D. Hammond, more

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our...

chapter

In-Place Irregular Computation for Message-Passing Chip-Multiprocessors

Zhang Youhui, Zhang Youyang, Li Yanhua, Fei Xiang, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 69 - 76

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

With the increase of CMP (Chip-Multiprocessor) scale, moving data to computation on chip becomes more expensive. Accordingly, moving computation to data has potential to improve efficiency. We propose an in-place computation co-design of many-simple-core CMP for irregular applications. The computing paradigm is that an application's critical irregular data (or part of them) is partitioned into on-chip...

chapter

VLSI implementation of LS-SVM training and classification using entropy based subset-selection

Andreas Bytyn, Jannik Springer, Rainer Leupers, Gerd Ascheid

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Machine Learning techniques such as Support Vector Machines (SVM) have found applications in many fields, e.g. in Wireless Sensor Networks (WSN) and sensor data processing in general. Especially in the case of WSN energy is very limited as agents solely operate based on battery power after they have been deployed, therefore energy efficiency is of great importance. Furthermore, agents are supposed...

chapter

Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs

Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA...

article

Efficient FPGA Implementation of OpenCL High-Performance Computing Applications via High-Level Synthesis

Fahad Bin Muslim, Liang Ma, Mehdi Roozmeh, Luciano Lavagno

IEEE Access > 2017 > 5 > 2747 - 2762

FPGA-based accelerators have recently evolved as strong competitors to the traditional GPU-based accelerators in modern high-performance computing systems. They offer both high computational capabilities and considerably lower energy consumption. High-level synthesis (HLS) can be used to overcome the main hurdle in the mainstream usage of the FPGA-based accelerators, i.e., the complexity of their...

chapter

Mini-apps for high performance data analysis

Sreenivas R. Sukumar, Michael A. Matheson, Ramakrishnan Kannan, Seung-Hwan Lim

2016 IEEE International Conference on Big Data (Big Data) > 1483 - 1492

2016 IEEE International Conference on Big Data (Big Data)

Scaling-up scientific data analysis and machine learning algorithms for data-driven discovery is a grand challenge that we face today. Despite the growing need for analysis from science domains that are generating ‘Big Data’ from instruments and simulations, building high-performance analytical workflows of data-intensive algorithms have been daunting because: (i) the ‘Big Data’ hardware and software...

chapter

Kernels for scalable data analysis in science: Towards an architecture-portable future

Sreenivas R. Sukumar, Ramakrishnan Kannan, Seung-Hwan Lim, Michael A. Matheson

2016 IEEE International Conference on Big Data (Big Data) > 1026 - 1031

2016 IEEE International Conference on Big Data (Big Data)

In this paper, we pose and address some of the unique challenges in the analysis of scientific Big Data on supercomputing platforms. Our approach identifies, implements and scales numerical kernels that are critical to the instantiation of theory-inspired analytic workflows on modern computing architectures. We present the benefits of scalable kernels towards constructing algorithms such as principal...

chapter

GraVF: A vertex-centric distributed graph processing framework on FPGAs

Nina Engelhardt, Hayden Kwok-Hay So

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are promising platforms to efficiently execute distributed graph algorithms. Unfortunately, they are notoriously hard to program, especially when the problem size and system complexity increases. In this paper, we propose GraVF, a high-level design framework for distributed graph processing on FPGAs. It leverages the vertex-centric paradigm, which is naturally distributed and requires the user...

chapter

Fast spatial-spectral preprocessing for endmember extraction and spectral unmixing using graphic processing units

L. I. Jimenez, G. Martin, S. Sanchez, J. Plaza, more

2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 7038 - 7041

IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium

Linear spectral unmixing consists on the identification of spectrally pure constituents, called endmembers and their corresponding proportions or abundances using a linear model. Traditionally, most of the attention has been focussed on the exploitation of spectral information when identifying a set of endmembers and, only recently, some techniques try to take advantage of complementary information...

chapter

A new method to parallel implementation for batching vast small-scale computing tasks based on GPU

Jun Zhu, Haifeng Yao, Tao Yang, Qiaomei Zhou, more

2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA) > 2092 - 2095

2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA)

The calculation of small-scale data is commonly used in scientific computing and application domain, and the high-efficiency method of small calculation can give play to the potency of many calculation and application. In this paper, a novel self-adaptive parallel computing method based on the graphics processing unit (GPU) architecture for batches of small scale computing tasks is proposed herein...

chapter

A low-cost energy efficient image scaling processor for multimedia applications

Bharat Garg, V N S K Chaitanya Goteti, G K Sharma

2016 20th International Symposium on VLSI Design and Test (VDAT) > 1 - 6

2016 20th International Symposium on VLSI Design and Test (VDAT)

Image scaling is one of the widely used techniques in various portable devices to fit the image in their respective displays. Traditional image scaling architectures consume more power and hardware, making them inefficient for use in portable devices. In this paper, a low complexity image scaling algorithm is proposed. In the proposed algorithm, the target pixel is computed either by bilinear interpolation...

chapter

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

Ryan Eberhardt, Mark Hoemmen

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 663 - 672

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We examine the implementation of block compressed row storage (BCSR) sparse matrix-vector multiplication (SpMV) for sparse matrices with dense block substructure, optimized for blocks with sizes from 2x2 to 32x32, on CPU, Intel many-integrated-core, and GPU architectures. Previous research on SpMV for matrices with dense block substructure has largely focused on the design of novel data structures...

chapter

Accelerating all-pairs shortest path using a message-passing reconfigurable architecture

Osama G. Attia, Alex Grieve, Kevin R. Townsend, Phillip Jones, more

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

In this paper, we study the design and implementation of a reconfigurable architecture for graph processing algorithms. The architecture uses a message-passing model targeting shared-memory multi-FPGA platforms. We take advantage of our architecture to showcase a parallel implementation of the all-pairs shortest path algorithm (APSP) for unweighted directed graphs. Our APSP implementation adopts a...

chapter

GPU accelerated geometric multigrid method: Performance comparison on recent NVIDIA architectures

Iulian Stroia, Lucian Itu, Cosmin Nita, Laszlo Lazar, more

2015 19th International Conference on System Theory, Control and Computing (ICSTCC) > 175 - 179

2015 19th International Conference on System Theory, Control and Computing (ICSTCC)

During the past decade Graphics Processing Units (GPU) have been increasingly employed for speeding up compute intensive scientific applications. In this field, the geometric multigrid method (GMG) is one of the most efficient algorithms for solving large sparse linear systems of equations. Herein we analyze the performance of an optimized GPU based implementation of the GMG method on different state-of-the-art...

chapter

Design and Verification of Heterogeneous Streaming Parallel Mechanisms on Kepler CUDA

Kailong Zhang, Shaoli Zhou, Liang Hu, Hang Su, more

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing > 2256 - 2262

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM)

In many-core based parallel computing field, how to optimally allocate and schedule computing core resources according to characteristics of parallel applications is one typical and fundamental problem, which touches closely to computing performances. After analyzing features and mechanisms of Kepler CUDA architecture, three heterogeneous streaming parallel computing modes and corresponding constraints,...

chapter

A fast, energy-efficient abstraction for simultaneous breadth-first searches

Adam McLaughlin, Jason Riedy, David A. Bader

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2015 IEEE High Performance Extreme Computing Conference (HPEC)

Optimized GPU kernels are sufficiently complicated to write that they often are specialized to input data, target architectures, or applications. This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel and demonstrates a high-performance, general implementation. Our abstraction removes the burden of orchestrating graph traversal from the user while...

chapter

Vector processor for online lithium-ion battery capacity prediction

Yeyong Pang, Shaojun Wang, Yu Peng, Philip H.W. Leong

2015 12th IEEE International Conference on Electronic Measurement & Instruments (ICEMI) > 1 > 254 - 259

2015 12th IEEE International Conference on Electronic Measurement & Instruments (ICEMI)

Battery capacity prediction in aerospace systems is a computationally expensive problem. In this paper, we propose a novel field programmable gate array-based (FPGA) vector processor to reduce latency in this application. This processor architecture is optimized for the kernel recursive least squares (KRLS) algorithm, and used to perform online regression. Pipelining is employed to increase performance...

chapter

Performance optimization for the k-nearest neighbors kernel on x86 architectures

Chenhan D. Yu, Jianyu Huang, Woody Austin, Bo Xiao, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Nearest neighbor search is a cornerstone problem in computational geometry, non-parametric statistics, and machine learning. For N points, exhaustive search requires quadratic work, but many fast algorithms reduce the complexity for exact and approximate searches. The common kernel (kNN kernel) in all these algorithms solves many small-size problems exactly using exhaustive search. We propose an efficient...

INFONA - science communication portal

Search results

A Reconfigurable and Scalable FPGA Architecture for Bilateral Filtering

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

Fast linear algebra-based triangle counting with KokkosKernels

In-Place Irregular Computation for Message-Passing Chip-Multiprocessors

VLSI implementation of LS-SVM training and classification using entropy based subset-selection

Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs

Efficient FPGA Implementation of OpenCL High-Performance Computing Applications via High-Level Synthesis

Mini-apps for high performance data analysis

Kernels for scalable data analysis in science: Towards an architecture-portable future

GraVF: A vertex-centric distributed graph processing framework on FPGAs

Fast spatial-spectral preprocessing for endmember extraction and spectral unmixing using graphic processing units

A new method to parallel implementation for batching vast small-scale computing tasks based on GPU

A low-cost energy efficient image scaling processor for multimedia applications

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

Accelerating all-pairs shortest path using a message-passing reconfigurable architecture

GPU accelerated geometric multigrid method: Performance comparison on recent NVIDIA architectures

Design and Verification of Heterogeneous Streaming Parallel Mechanisms on Kepler CUDA

A fast, energy-efficient abstraction for simultaneous breadth-first searches

Vector processor for online lithium-ion battery capacity prediction

Performance optimization for the k-nearest neighbors kernel on x86 architectures

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options