Search results

chapter

FlexCL: An analytical performance model for OpenCL workloads on flexible FPGAs

Shuo Wang, Yun Liang, Wei Zhang

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

The recent adoption of OpenCL programming model by FPGA vendors has realized the function portability of OpenCL workloads on FPGA. However, the poor performance portability prevents its wide adoption. To harness the power of FPGAs using OpenCL programming model, it is advantageous to design an analytical performance model to estimate the performance of OpenCL workloads on FPGAs and provide insights...

chapter

A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model

Shuo Wang, Yun Liang

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Iterative stencil algorithms find applications in a wide range of domains. FPGAs have long been adopted for computation acceleration due to its advantages of dedicated hardware design. Hence, FPGAs are a compelling alternative for executing iterative stencil algorithms. However, efficient implementation of iterative stencil algorithms on FPGAs is very challenging due to the data dependencies between...

chapter

Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs

Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA...

chapter

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

Li Ding, Ping Kang, Wenbo Yin, Linli Wang

2016 International Conference on Field-Programmable Technology (FPT) > 269 - 272

2016 International Conference on Field-Programmable Technology (FPT)

This paper introduces a hardware TCP Offload Engine (TOE) aiming at low-latency communication systems. The throughput can reach 9.99 Gbps with the Jumbo frame. The input-to-output receiving latency of a packet consists of 100 bytes payload and 64 bytes header with timestamp is close to 90 nanoseconds. The application-to-application latency between the proposed acceleration system and the native Windows...

chapter

FPGA-based acceleration of FDAS module using OpenCL

Haomiao Wang, Ming Zhang, Prabu Thiagaraj, Oliver Sinnen

2016 International Conference on Field-Programmable Technology (FPT) > 53 - 60

2016 International Conference on Field-Programmable Technology (FPT)

The Square Kilometre Array (SKA) project will be the world largest radio telescope array. With the growth of the number of antennas, the signals that need to be processed increase dramatically. One import element of the SKA central signal processor (CSP) package is pulsar search. This paper focuses on the FPGA-based acceleration of the frequency-domain acceleration search (FDAS) module, part of SKA...

chapter

Random projections for scaling machine learning on FPGAs

Sean Fox, Stephen Tridgell, Craig Jin, Philip H.W. Leong

2016 International Conference on Field-Programmable Technology (FPT) > 85 - 92

2016 International Conference on Field-Programmable Technology (FPT)

Random projections have recently emerged as a powerful technique for large scale dimensionality reduction in machine learning applications. Crucially, the projection can be obtained from sparse probability distributions, enabling hardware implementations with little overhead. In this paper, we describe a Field-Programmable Gate Array (FPGA) implementation alongside a kernel adaptive filter (KAF) that...

chapter

Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks

Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, more

2016 International Conference on Field-Programmable Technology (FPT) > 265 - 268

2016 International Conference on Field-Programmable Technology (FPT)

Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs showing significant improvements in their classification and training times. With these improvements, many frameworks have become available for implementing CNNs...

chapter

Tessellation-based multi-block memory mapping scheme for high-level synthesis with FPGA

auJuan Escobedo, auMingjie Lin

2016 International Conference on Field-Programmable Technology (FPT) > 125 - 132

2016 International Conference on Field-Programmable Technology (FPT)

For many intensive computing tasks, simultaneous data access into multi-dimensional data arrays is highly restricted by its data mapping strategy and memory port constraint. As such, to increase memory accessing bandwidth, innovative memory partitioning and mapping algorithms have been proposed to simultaneously access multiple memory blocks through physically distributing data elements in the same...

chapter

High-Level Designs of Complex FIR Filters on FPGAs for the SKA

Haomiao Wang, Joao Gante, Ming Zhang, Gabriel Falcao, more

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 797 - 804

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

High-end FPGAs are widely adopted as hardware accelerators, due to their power efficiency, flexibility, and high-performance computing ability. They are, therefore, extremely useful devices for a project with challenges and constraints such as the Square Kilometre Array (SKA). However, the traditional design methods require expert hardware knowledge and long development times for each of the SKA's...

chapter

FPGA implementation of the coupled filtering method

Chen Zhang, Tianzhu Liang, Philip K.T. Mok, Weichuan Yu

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 435 - 442

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

In ultrasound image analysis, speckle tracking methods are widely applied to study the elasticity of body tissue. However, “feature-motion decorrelation” still remains as a challenge for speckle tracking methods. Recently, a coupled filtering method was proposed to accurately estimate strain values when the tissue deformation is large. The major drawback of the new method is its high computational...

chapter

Fast and cycle-accurate simulation of multi-threaded applications on SMP architectures using hybrid prototyping

Ehsan Saboori, Samar Abdi

2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 1 - 10

2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

This paper presents a fast and cycle accurate simulation environment for early power-performance analysis of multi-threaded applications targeted to symmetric multiprocessing embedded architectures. Our simulation environment leverages the hybrid prototyping technique, where a lightweight emulation kernel performs logical simulation of multiple identical cores on top of a single physical instance...

chapter

Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs

Hamid Reza Zohouri, Naoya Maruyamay, Aaron Smith, Motohiko Matsuda, more

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 409 - 420

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

We evaluate the power and performance of the Rodinia benchmark suite using the Altera SDK for OpenCL targeting a Stratix V FPGA against a modern CPU and GPU. We study multiple OpenCL kernels per benchmark, ranging from direct ports of the original GPU implementations to loop-pipelined kernels specifically optimized for FPGAs. Based on our results, we find that even though OpenCL is functionally portable...

chapter

An OpenCL Framework for Distributed Apps on a Multidimensional Network of FPGAs

Abhijeet Lawande, Alan D. George, Herman Lam

2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3) > 42 - 49

2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3)

In an effort to offset the rapidly increasing data volume processed by large data centers today, their architects have increasingly been exploring unconventional architectures like FPGAs. Large-scale RC systems like Novo-G# show promise for both big-data processing and HPC, but are limited by a lengthy and difficult design process. In this paper we present a mixed MPI/OpenCL framework that enables...

chapter

Hardware thread reordering to boost OpenCL throughput on FPGAs

Amir Momeni, Hamed Tabkhi, Gunar Schirner, David Kaeli

2016 IEEE 34th International Conference on Computer Design (ICCD) > 257 - 264

2016 IEEE 34th International Conference on Computer Design (ICCD)

Availability of OpenCL for FPGAs has raised new questions about the efficiency of massive thread-level parallelism on FPGAs. The general trend is toward creating deep pipelining and in-order execution of many OpenCL threads across a shared data-path. While this can be a very effective approach for regular kernels, its efficiency significantly diminishes for irregular kernels with runtime-dependent...

chapter

Tuning Stencil codes in OpenCL for FPGAs

Qi Jia, Huiyang Zhou

2016 IEEE 34th International Conference on Computer Design (ICCD) > 249 - 256

2016 IEEE 34th International Conference on Computer Design (ICCD)

OpenCL is designed as a parallel programming framework to support heterogeneous computing platforms. The implicit or explicit parallelism in OpenCL kernel code enables efficient FPGA implementation from a high-level programming abstraction. However, FPGA architecture is completely different from GPU architecture, for which OpenCL is widely used. Tuning OpenCL codes to achieve high performance on FPGAs...

chapter

Linux task scheduler for reconfigurable hardware accelerators

Petr Cvek, Ondrej Novak

2016 15th Biennial Baltic Electronics Conference (BEC) > 71 - 74

2016 15th Biennial Baltic Electronics Conference (BEC)

This article proposes a modification of the standard Linux scheduler for a support of a reconfigurable heterogeneous multiprocessor system. The standard Linux scheduler is limited to a homogeneous multiprocessor system only. The addition of the processing core with a different feature requires modification of a decision algorithm of the scheduler as a heterogeneous task cannot be executed on any processing...

chapter

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Xushen Han, Dajiang Zhou, Shihao Wang, Shinji Kimura

2016 IEEE 34th International Conference on Computer Design (ICCD) > 320 - 327

2016 IEEE 34th International Conference on Computer Design (ICCD)

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained...

chapter

Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing

Iman Firmansyah, Yoshiki Yamaguchi, Taisuke Boku

2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA) > 23 - 27

2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA)

FPGA, or Field Programmable Gate Array, has been widely used for several applications such as digital signal and image processing, video processing, software-defined radio, radar processing, medical imaging and so on. Currently, with the significance growth of parallel computing and cloud computing application, FPGA provides another solution for high performance computing instead of CPU or GPGPU due...

chapter

Synthesis and evaluation of SHA-1 algorithm using altera SDK for OpenCL

Ian Janik, Mohammed A. S. Khalid

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS) > 1 - 4

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS)

This paper uses the Altera SDK for OpenCL (AOCL) High-Level Synthesis (HLS) tool to accelerate the computation of the SHA-1 hash function. Using FPGAs to increase throughput of this algorithm has been a popular topic in research. The work done thus far, focuses on HDL based design methodologies. The goal of this paper is to determine if the HLS implementation can compare in terms of speed to the HDL...

chapter

Emulation of processing in memory architecture for application development

Jin-San Kwon, Tae-ho Hwang, Dong-Sun Kim

2016 International SoC Design Conference (ISOCC) > 183 - 184

2016 International SoC Design Conference (ISOCC)

Since the new technologies like big data and cloud computing require tremendous amount of transactions between processors and memory, researches on a new memory system called Processing in Memory (PIM) architecture has been suggested as a solution for those memory intensive applications. To make software utilize the new architecture, a development environment with tool chain and debug infrastructures...

INFONA - science communication portal

Search results

FlexCL: An analytical performance model for OpenCL workloads on flexible FPGAs

A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model

Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

FPGA-based acceleration of FDAS module using OpenCL

Random projections for scaling machine learning on FPGAs

Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks

Tessellation-based multi-block memory mapping scheme for high-level synthesis with FPGA

High-Level Designs of Complex FIR Filters on FPGAs for the SKA

FPGA implementation of the coupled filtering method

Fast and cycle-accurate simulation of multi-threaded applications on SMP architectures using hybrid prototyping

Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs

An OpenCL Framework for Distributed Apps on a Multidimensional Network of FPGAs

Hardware thread reordering to boost OpenCL throughput on FPGAs

Tuning Stencil codes in OpenCL for FPGAs

Linux task scheduler for reconfigurable hardware accelerators

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing

Synthesis and evaluation of SHA-1 algorithm using altera SDK for OpenCL

Emulation of processing in memory architecture for application development

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options