Search results

chapter

A 142MOPS/mW integrated programmable array accelerator for smart visual processing

Satyajit Das, Davide Rossi, Kevin J. M. Martin, Philippe Coussy, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Due to increasing demand of low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy-efficient programmable accelerators. This paper proposes an Integrated Programmable-Array accelerator (IPA) architecture based on an innovative execution model, targeted to accelerate both data and control-flow parts of deeply embedded...

chapter

Evaluation of a declarative Linux kernel FPGA manager for dynamic partial reconfiguration

Ulrich Langenbach, Stefan Wiehler, Endric Schubert

2017 International Conference on FPGA Reconfiguration for General-Purpose Computing (FPGA4GPC) > 13 - 18

2017 International Conference on FPGA Reconfiguration for General-Purpose Computing (FPGA4GPC)

Heterogeneous Multi-Processor Systems-on-Chip, whether ARM or x86 based, promise further performance scalability by complementing temporal compute in CPUs/GPUs with spatial compute in digital circuitry. Dynamic partial reconfiguration (DPR) extends such compute architectures by making use of different spatial compute elements over time. Novel research [1] presents means for operating DPR by the Linux...

chapter

A generic execution framework for shared FPGA-based accelerators

Dumitru Laurentiu Alexandru, Rares Maniu

2017 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines and Power Electronics (ACEMP) > 803 - 808

2017 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines and Power Electronics (ACEMP)

FPGAs are continuously increasing in both chip size and operating frequency. Dynamic reconfiguration is easier and more stable with current generation of hardware and software tools. These characteristics have made them more accessible to generic acceleration tasks instead of specialized functions. As a consequence, FPGAs are being deployed in more computing clusters than in the past. This leads to...

chapter

Alternative Processor Within Threshold: Flexible Scheduling on Heterogeneous Systems

Sonia Lopez, Stavan Satish Karia

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 42 - 53

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Computing systems have become increasingly heterogeneous contributing to higher performance and power efficiency. However, this is at the cost of increasing the overall complexity of designing such systems. One key challenge in the design of heterogeneous systems is the efficient scheduling of computational load. To address this challenge, this paper thoroughly analyzes state of the art scheduling...

chapter

Portable Implementation of Advanced Driver-Assistance Algorithms on Heterogeneous Architectures

Oliver Jakob Arndt, Fabian David Trager, Tobias MoB, Holger Blume

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 6 - 17

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The increased use of application-specific computational devices turns even low-power chips into high-performance computers. Not only additional accelerators (e.g., GPU, DSP, or even FPGA), but also heterogeneous CPU clusters form modern computer systems. Programming these chips is however challenging, due to management overhead, data transfer delays, and a missing unification of the programming flow...

chapter

FPGA implementation of real time video signal processing using Sobel, Robert, Prewitt and Laplacian filters

Emrah Onat

2017 25th Signal Processing and Communications Applications Conference (SIU) > 1 - 4

2017 25th Signal Processing and Communications Applications Conference (SIU)

In this paper, hardware implementation of edge detection at real time video signals using Sobel, Robert, Prewitt and Laplacian filters based on FPGA is explained. Besides, filters are compared in many ways. Edge detection is an elemantary and fundamental tool for image segmentation and feature extraction. Very high speed hardware like FPGA's are used to implement the image and video processing algorithms...

chapter

FxpNet: Training a deep convolutional neural network in fixed-point representation

Xi Chen, Xiaolin Hu, Hucheng Zhou, Ningyi Xu

2017 International Joint Conference on Neural Networks (IJCNN) > 2494 - 2501

2017 International Joint Conference on Neural Networks (IJCNN)

We introduce FxpNet, a framework to train deep convolutional neural networks with low bit-width arithmetics in both forward pass and backward pass. During training FxpNet further reduces the bit-width of stored parameters (also known as primal parameters) by adaptively updating their fixed-point formats. These primal parameters are usually represented in the full resolution of floating-point values...

chapter

Exploring optimized accelerator design for binarized convolutional neural networks

Kodai Ueyoshi, Kota Ando, Kentaro Orimo, Masayuki Ikebe, more

2017 International Joint Conference on Neural Networks (IJCNN) > 2510 - 2516

2017 International Joint Conference on Neural Networks (IJCNN)

The convolutional neural network (CNN) is a state-of-the-art model that can achieve significantly high accuracy in many machine-learning tasks. Recently, for further developing the practical applications of CNNs, efficient hardware platforms for accelerating CNN have been throughly studied. A binarized neural network has been reported to minimize the multipliers, which consume a large amount of resources,...

chapter

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Javier Alejandro Varela, Norbert Wehn, Qian Liang, Songyin Tang

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 124 - 131

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the field of high performance heterogeneous computing systems, field programmable gate arrays (FPGAs) have shown great advantages in terms of acceleration and energy efficiency. And with the inclusion of the OpenCL framework for parallel programming, the design complexity has been greatly reduced. However, the parallel implementation of applications containing data-dependent branches usually experiences...

chapter

Fabrication security and trust of domain-specific ASIC processors

Michael Vai, Karen Gettings, Theodore Lyszczarz

2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST) > 152

2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)

Application specific integrated circuits (ASICs) are commonly used to implement high-performance signal-processing systems for high-volume applications, but their high development costs and inflexible nature make ASICs inappropriate for algorithm development and low-volume DoD applications. In addition, the intellectual property (IP) embedded in the ASIC is at risk when fabricated in an untrusted...

chapter

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, more

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 152 - 159

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

DNNs (Deep Neural Networks) have demonstrated great success in numerous applications such as image classification, speech recognition, video analysis, etc. However, DNNs are much more computation-intensive and memory-intensive than previous shallow models. Thus, it is challenging to deploy DNNs in both large-scale data centers and real-time embedded systems. Considering performance, flexibility, and...

chapter

An FPGA Design Framework for CNN Sparsification and Acceleration

Sicheng Li, Wei Wen, Yu Wang, Song Han, more

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 28

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Convolutional neural networks (CNNs) have recently broken many performance records in image recognition and object detection problems. The success of CNNs, to a great extent, is enabled by the fast scaling-up of the networks that learn from a huge volume of data. The deployment of big CNN models can be both computation-intensive and memory-intensive, leaving severe challenges to hardware implementations...

chapter

Evaluating Rapid Application Development with Python for Heterogeneous Processor-Based FPGAs

Andrew G. Schmidt, Gabriel Weisz, Matthew French

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 121 - 124

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

As modern FPGAs evolve to include more heterogeneous processing elements, such as ARM cores, it makes sense to consider these devices as processors first and FPGA accelerators second. As such, the conventional FPGA development environment must also adapt to support more software-like programming functionality. While high-level synthesis tools can help reduce FPGA development time, there still remains...

chapter

Efficient Particle-Grid Space Interpolation of an FPGA-Accelerated Particle-in-Cell Plasma Simulation

Almomany Abedalmuhdi, B. Earl Wells, Ken-Ichi Nishikawa

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 76 - 79

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

This paper highlights on-going research to effectively utilize a commercially available spatially reconfigurable platform and the OpenCL framework to improve the run-time performance and reduce the overall energy consumption of an existing 2.5D Electrostatic Particle-in-Cell type plasma simulation. This problem is constrained by the finite internal FPGA resources and the performance mandate that all...

chapter

The implementation of edge detection on HSA environment

Sethakarn Prongnuch, Theerayod Wiangtong

2017 International Electrical Engineering Congress (iEECON) > 1 - 4

2017 International Electrical Engineering Congress (iEECON)

This paper presents the implementation of image edge detection on Heterogeneous System Architecture (HSA). HSA which includes ARM processor, Coprocessor and FPGA are compared with x64 CPU in terms of performance and power consumption. The experimental results show that although the best execution time is from x64 CPU, HSA has 50 times more energy efficiency. Also, HSA can exploit coprocessors and...

chapter

VLSI Realization of Lanczos Interpolation for a Generic Video Scaling Algorithm

S. Safinaz, A. V. Ravi Kumar

2017 International Conference on Recent Advances in Electronics and Communication Technology (ICRAECT) > 17 - 23

2017 International Conference on Recent Advances in Electronics and Communication Technology (ICRAECT)

Video scaling is a process of resizing a digital frame for preferred view-ability without losing the original content of the video, involving a trade-off between efficiency, smoothness and sharpness. In this research paper, a Generic Algorithm is proposed for enhancement of a motion picture with a given scaling factor without compromising on the picture quality. The proposed algorithm has been verified...

chapter

Stochastic-based multi-stage streaming realization of deep convolutional neural network

Mohammed Alawad, Mingjie Lin

2017 18th International Symposium on Quality Electronic Design (ISQED) > 13 - 18

2017 18th International Symposium on Quality Electronic Design (ISQED)

Large-scale convolutional neural network (CNN), conceptually mimicking the operational principle of visual perception in human brain, has been widely applied to tackle many challenging computer vision and artificial intelligence applications. Unfortunately, despite of its simple architecture, a typically-sized CNN is well known to be computationally intensive. This work presents a novel stochastic-based...

chapter

Design Space exploration of FPGA-based accelerators with multi-level parallelism

Guanwen Zhong, Alok Prakash, Siqi Wang, Yun Liang, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1141 - 1146

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Applications containing compute-intensive kernels with nested loops can effectively leverage FPGAs to exploit fine-and coarse-grained parallelism. HLS tools used to translate these kernels from high-level languages (e.g., C/C−−), however, are inefficient in exploiting multiple levels of parallelism automatically, thereby producing sub-optimal accelerators. Moreover, the large design space resulting...

chapter

From exaflop to exaflow

Tobias Becker, Pavel Burovskiy, Anna Maria Nestorov, Hristina Palikareva, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 404 - 409

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Exascale computing is facing a gap between the ever increasing demand for application performance and the underlying chip technology that does no longer deliver the expected exponential increases in CPU performance. The industry is now progressively moving towards dedicated accelerators to deliver high performance and better energy efficiency. However, the question of programmability still remains...

chapter

Developing dynamic profiling and debugging support in OpenCL for FPGAs

Anshuman Verma, Huiyang Zhou, Skip Booth, Robbie King, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

With FPGAs emerging as a promising accelerator for general-purpose computing, there is a strong demand to make them accessible to software developers. Recent advances in OpenCL compilers for FPGAs pave the way for synthesizing FPGA hardware from OpenCL kernel code. To enable broader adoption of this paradigm, significant challenges remain. This paper presents our efforts in developing dynamic profiling...

INFONA - science communication portal

Search results

A 142MOPS/mW integrated programmable array accelerator for smart visual processing

Evaluation of a declarative Linux kernel FPGA manager for dynamic partial reconfiguration

A generic execution framework for shared FPGA-based accelerators

Alternative Processor Within Threshold: Flexible Scheduling on Heterogeneous Systems

Portable Implementation of Advanced Driver-Assistance Algorithms on Heterogeneous Architectures

FPGA implementation of real time video signal processing using Sobel, Robert, Prewitt and Laplacian filters

FxpNet: Training a deep convolutional neural network in fixed-point representation

Exploring optimized accelerator design for binarized convolutional neural networks

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Fabrication security and trust of domain-specific ASIC processors

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

An FPGA Design Framework for CNN Sparsification and Acceleration

Evaluating Rapid Application Development with Python for Heterogeneous Processor-Based FPGAs

Efficient Particle-Grid Space Interpolation of an FPGA-Accelerated Particle-in-Cell Plasma Simulation

The implementation of edge detection on HSA environment

VLSI Realization of Lanczos Interpolation for a Generic Video Scaling Algorithm

Stochastic-based multi-stage streaming realization of deep convolutional neural network

Design Space exploration of FPGA-based accelerators with multi-level parallelism

From exaflop to exaflow

Developing dynamic profiling and debugging support in OpenCL for FPGAs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options