Search results

Items from 1 to 20 out of 28 results

chapter

OpenCL-based design pattern for line rate packet processing

Jehandad Khan, Peter Athanas, Skip Booth, John Marshall

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 190 - 194

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

The ever changing nature of network technology requires a flexible platform that can change as the technology evolves. In this work, a complete networking switch designed in OpenCL is presented, identifying several high-level constructs that form the building blocks of any network application targeting FPGAs. These include the notion of an on-chip global memory and kernels constantly processing data...

chapter

Fast and efficient implementation of Convolutional Neural Networks on FPGA

Abhinav Podili, Chi Zhang, Viktor Prasanna

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 11 - 18

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

State-of-the-art CNN models for Image recognition use deep networks with small filters instead of shallow networks with large filters, because the former requires fewer weights. In the light of above trend, we present a fast and efficient FPGA based convolution engine to accelerate CNN models over small filters. The convolution engine implements Winograd minimal filtering algorithm to reduce the number...

chapter

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

Li Ding, Ping Kang, Wenbo Yin, Linli Wang

2016 International Conference on Field-Programmable Technology (FPT) > 269 - 272

2016 International Conference on Field-Programmable Technology (FPT)

This paper introduces a hardware TCP Offload Engine (TOE) aiming at low-latency communication systems. The throughput can reach 9.99 Gbps with the Jumbo frame. The input-to-output receiving latency of a packet consists of 100 bytes payload and 64 bytes header with timestamp is close to 90 nanoseconds. The application-to-application latency between the proposed acceleration system and the native Windows...

chapter

Hardware thread reordering to boost OpenCL throughput on FPGAs

Amir Momeni, Hamed Tabkhi, Gunar Schirner, David Kaeli

2016 IEEE 34th International Conference on Computer Design (ICCD) > 257 - 264

2016 IEEE 34th International Conference on Computer Design (ICCD)

Availability of OpenCL for FPGAs has raised new questions about the efficiency of massive thread-level parallelism on FPGAs. The general trend is toward creating deep pipelining and in-order execution of many OpenCL threads across a shared data-path. While this can be a very effective approach for regular kernels, its efficiency significantly diminishes for irregular kernels with runtime-dependent...

chapter

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Xushen Han, Dajiang Zhou, Shihao Wang, Shinji Kimura

2016 IEEE 34th International Conference on Computer Design (ICCD) > 320 - 327

2016 IEEE 34th International Conference on Computer Design (ICCD)

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained...

chapter

Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing

Iman Firmansyah, Yoshiki Yamaguchi, Taisuke Boku

2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA) > 23 - 27

2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA)

FPGA, or Field Programmable Gate Array, has been widely used for several applications such as digital signal and image processing, video processing, software-defined radio, radar processing, medical imaging and so on. Currently, with the significance growth of parallel computing and cloud computing application, FPGA provides another solution for high performance computing instead of CPU or GPGPU due...

chapter

Synthesis and evaluation of SHA-1 algorithm using altera SDK for OpenCL

Ian Janik, Mohammed A. S. Khalid

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS) > 1 - 4

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS)

This paper uses the Altera SDK for OpenCL (AOCL) High-Level Synthesis (HLS) tool to accelerate the computation of the SHA-1 hash function. Using FPGAs to increase throughput of this algorithm has been a popular topic in research. The work done thus far, focuses on HDL based design methodologies. The goal of this paper is to determine if the HLS implementation can compare in terms of speed to the HDL...

chapter

FPGA kernels for classification rule induction

P. Skoda, B. Medved Rogina

2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 337 - 342

2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Classification is one of the core tasks in machine learning data mining. One of several models of classification are classification rules, which use a set of if-then rules to describe a classification model. In this paper we present a set of FPGA-based compute kernels for accelerating classification rule induction. The kernels can be combined to perform specific procedures in rule induction process,...

chapter

Throughput oriented FPGA overlays using DSP blocks

Abhishek Kumar Jain, Douglas L. Maskell, Suhaib A. Fahmy

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1628 - 1633

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Design productivity is a major concern preventing the mainstream adoption of FPGAs. Overlay architectures have emerged as one possible solution to this challenge, offering fast compilation and software-like programmability. However, overlays typically suffer from area and performance overheads due to limited consideration for the underlying FPGA architecture. These overlays have often been of limited...

chapter

OpenCL library of stream memory components targeting FPGAs

Jasmina Vasiljevic, Ralph Wittig, Paul Schumacher, Jeff Fifield, more

2015 International Conference on Field Programmable Technology (FPT) > 104 - 111

2015 International Conference on Field Programmable Technology (FPT)

In recent years, high-level languages and compilers, such as OpenCL have improved both productivity and FPGA adoption on a wider scale. One of the challenges in the design of high-performance stream FPGA applications is iterative manual optimization of the numerous application buffers (e.g., arrays, FIFOs and scratch-pads). First, to achieve the desired throughput, the programmer faces the burden...

chapter

Implementing Ultra Low Latency Data Center Services with Programmable Logic

John W. Lockwood, Madhu Monga

2015 IEEE 23rd Annual Symposium on High-Performance Interconnects > 68 - 77

2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI)

Data centers require many low-level network services to implement high-level applications. Key-Value Store (KVS) is a critical service that associates values with keys and allows machines to share these associations over a network. Mostexisting KVS systems run in software and scale out by running parallel processes on multiple microprocessor cores to increase throughput. In this paper, we take an...

chapter

Characterization of OpenCL on a scalable FPGA architecture

Shanyuan Gao, Jeremy Chritz

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) > 1 - 6

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

The recent release of Altera's SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for high performance computing. The design...

chapter

Frequency table computation on dataflow architecture

P. Skoda, V. Sruk, B. Medved Rogina

2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 342 - 346

2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Frequency table computation is a key step in decision tree learning algorithms. In this paper we present a novel implementation targeted for dataflow architecture implemented on field programmable gate array (FPGA). Consistent with dataflow model of computation, the kernel views input dataset as synchronous streams of attributes and class values. The kernel was benchmarked using key functions from...

chapter

Open the Gates: Using High-level Synthesis towards programmable LDPC decoders on FPGAs

Frederico Pratas, Joao Andrade, Gabriel Falcao, Vitor Silva, more

2013 IEEE Global Conference on Signal and Information Processing > 1274 - 1277

2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

State-of-the-art decoders for LDPC codes adopted by several digital communication standards require a significant amount of hardware resources to achieve the desired high throughput performance. With technology scaling below the 22nm and with billions of transistors available per chip/device, the development cost and complexity of such designs represent an increasing challenge for hardware designers...

chapter

Active SSD design for energy-efficiency improvement of web-scale data analysis

Jian Ouyang, Shiding Lin, Zhenyu Hou, Peng Wang, more

International Symposium on Low Power Electronics and Design (ISLPED) > 286 - 291

2013 IEEE International Symposium on Low Power Electronics and Design (ISLPED)

NAND flash based solid state drives (SSDs) have been widely adopted as storage devices in modern data centers to provide high performance I/O services. Recently, researchers proposed several schemes to improve energy efficiency of the system by off-loading specific computation tasks from generic processors to local processing elements in SSD controllers. However, it is inefficient to directly apply...

chapter

Throughput-oriented kernel porting onto FPGAs

Alexandros Papakonstantinou, Deming Chen, Wen-Mei Hwu, Jason Cong, more

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 10

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)

Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this...

chapter

A kernel interleaved scheduling method for streaming applications on soft-core vector processors

Chengwei Zheng, John McAllister, Yun Wu

2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation > 278 - 285

2011 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XI)

Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures...

chapter

A novel FPGA-based SVM classifier

M Papadonikolakis, C Bouganis

2010 International Conference on Field-Programmable Technology > 283 - 286

2010 International Conference on Field-Programmable Technology (FPT 2010)

Support Vector Machines (SVMs) are a powerful supervised learning tool, providing state-of-the-art accuracy at a cost of high computational complexity. The SVM classification suffers from linear dependencies on the number of the Support Vectors and the problem's dimensionality. In this work, we propose a scalable FPGA architecture for the acceleration of SVM classification, which exploits the device...

chapter

MARC: A Many-Core Approach to Reconfigurable Computing

I Lebedev, Shaoyi Cheng, A Doupnik, J Martin, more

2010 International Conference on Reconfigurable Computing and FPGAs > 7 - 12

2010 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2010)

We present a Many-core Approach to Reconfigurable Computing (MARC), enabling efficient high-performance computing for applications expressed using parallel programming models such as OpenCL. The MARC system exploits abundant special FPGA resources such as distributed block memories and DSP blocks to implement complete single-chip high efficiency many-core micro architectures. The key benefits of MARC...

chapter

Automatic Generation of Stream Descriptors for Streaming Architectures

Lei Gao, David Zaretsky, Gaurav Mittal, Dan Schonfeld, more

2010 39th International Conference on Parallel Processing > 307 - 312

39th International Conference on Parallel Processing (ICPP 2010)

We describe a novel approach for automatically generating streaming architectures from software programs. While existing systems require user-defined stream models, our method automatically identifies producer-consumer streaming relationships and translates them into streaming architectures. Data streams between producer-consumer kernels are represented using a combination of stream descriptors and...

Keywords:
FIELD PROGRAMMABLE GATE ARRAYS
THROUGHPUT
KERNEL

Publication date

Set your own date range

Keywords

FPGA (10)
HARDWARE (10)
COMPUTER ARCHITECTURE (7)
PARALLEL PROCESSING (5)
RANDOM ACCESS MEMORY (5)
BANDWIDTH (4)
PROGRAM PROCESSORS (4)
ALGORITHM DESIGN AND ANALYSIS (3)
CORRELATION (3)
DATA MINING (3)
ENGINES (3)
PIPELINES (3)
BENCHMARK TESTING (2)
COMPILER (2)
CONVOLUTIONAL NEURAL NETWORKS (2)
DATA ANALYSIS (2)
DATAFLOW (2)
GRAPHICS PROCESSING UNITS (2)
HARDWARE DESCRIPTION LANGUAGES (2)
HARDWARE DESIGN LANGUAGES (2)
HIGH-PERFORMANCE COMPUTING (2)
INSTRUCTION SETS (2)
LOGIC DESIGN (2)
MULTIPROCESSING SYSTEMS (2)
OPTIMIZATION (2)
PERFORMANCE EVALUATION (2)
PIPELINE PROCESSING (2)
PIXEL (2)
PROGRAM COMPILERS (2)
RECONFIGURABLE ARCHITECTURES (2)
RECONFIGURABLE COMPUTING (2)
SERVERS (2)
STREAMING MEDIA (2)
SYSTEM-ON-CHIP (2)
10 GIGABITS ETHERNET (1)
36-PROCESSOR SYSTEM (1)
ACTIVE SSD (1)
ADAPTABLE ARCHITECTURES (1)
ALGORITHM ACCELERATION (1)
API (1)
APPLICATION PROGRAM INTERFACES (1)
ARCHITECTURE (1)
ARRAYS (1)
ASTRONOMY COMPUTING (1)
AUTOMATED DESIGN (1)
AUTOMATED DESIGN TOOLS (1)
AUTOMATED FPGA (1)
AUTOMATED FPGA HARDWARE DESIGN (1)
AUTOMATIC GENERATION STREAMING ARCHITECTURES (1)
AUTOMATIC KERNEL REPLICATION (1)
AVIONICS (1)
AVIONICS SIGNAL PROCESSING (1)
BACKPLANES (1)
BACKPROPAGATION (1)
BACKWARD PROPAGATION (1)
BAYESIAN NETWORK INFERENCE PROBLEM (1)
BELIEF NETWORKS (1)
BIOINSPIRED DYNAMIC TASK REPLICATION ALGORITHM (1)
BIT-LEVEL RESOURCE CONTROL (1)
BLADES (1)
BRIDGE MODULE (1)
BRIDGES (1)
BROOK STREAMING LANGUAGE (1)
CATALAN NUMBERS (1)
CELLULAR ARCHITECTURE (1)
CLASSIFICATION RULES (1)
CLOCKS (1)
COARSE-GRAIN MULTITHREADING (1)
COMMERCIAL C2H BEHAVIORAL SYNTHESIS COMPILER (1)
COMPILER GENERATED SYSTOLIC ARRAYS (1)
COMPLEXITY THEORY (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTER LANGUAGES (1)
CONVOLUTION (1)
COSMOLOGICAL DATA ANALYSIS ALGORITHM (1)
COSMOLOGY (1)
CRYPTOGRAPHY (1)
DATA FLOW COMPUTING (1)
DATA FLOW TYPE PROBLEM (1)
DATA PROCESSING (1)
DATA REUSE (1)
DATA STREAMING (1)
DATABASES (1)
DATACENTER (1)
DATAFLOW-STYLE FINE-GRAIN THREADING (1)
DECISION TREE LEARNING (1)
DECISION TREES (1)
DECODING (1)
DIGITAL LOGIC DESIGN (1)
DIGITAL SIGNAL PROCESSING (1)
DIME-C (1)
DOUBLE BUFFERING (1)
DPDK (1)
DSP APPLICATION (1)
DYNAMIC RANGE (1)
DYNAMIC RANGE DIVERSITY (1)
EFFICIENT (1)
more

INFONA - science communication portal

Search results

OpenCL-based design pattern for line rate packet processing

Fast and efficient implementation of Convolutional Neural Networks on FPGA

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

Hardware thread reordering to boost OpenCL throughput on FPGAs

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing

Synthesis and evaluation of SHA-1 algorithm using altera SDK for OpenCL

FPGA kernels for classification rule induction

Throughput oriented FPGA overlays using DSP blocks

OpenCL library of stream memory components targeting FPGAs

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Characterization of OpenCL on a scalable FPGA architecture

Frequency table computation on dataflow architecture

Open the Gates: Using High-level Synthesis towards programmable LDPC decoders on FPGAs

Active SSD design for energy-efficiency improvement of web-scale data analysis

Throughput-oriented kernel porting onto FPGAs

A kernel interleaved scheduling method for streaming applications on soft-core vector processors

A novel FPGA-based SVM classifier

MARC: A Many-Core Approach to Reconfigurable Computing

Automatic Generation of Stream Descriptors for Streaming Architectures

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options