The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The recent adoption of OpenCL programming model by FPGA vendors has realized the function portability of OpenCL workloads on FPGA. However, the poor performance portability prevents its wide adoption. To harness the power of FPGAs using OpenCL programming model, it is advantageous to design an analytical performance model to estimate the performance of OpenCL workloads on FPGAs and provide insights...
Iterative stencil algorithms find applications in a wide range of domains. FPGAs have long been adopted for computation acceleration due to its advantages of dedicated hardware design. Hence, FPGAs are a compelling alternative for executing iterative stencil algorithms. However, efficient implementation of iterative stencil algorithms on FPGAs is very challenging due to the data dependencies between...
Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA...
This paper introduces a hardware TCP Offload Engine (TOE) aiming at low-latency communication systems. The throughput can reach 9.99 Gbps with the Jumbo frame. The input-to-output receiving latency of a packet consists of 100 bytes payload and 64 bytes header with timestamp is close to 90 nanoseconds. The application-to-application latency between the proposed acceleration system and the native Windows...
The Square Kilometre Array (SKA) project will be the world largest radio telescope array. With the growth of the number of antennas, the signals that need to be processed increase dramatically. One import element of the SKA central signal processor (CSP) package is pulsar search. This paper focuses on the FPGA-based acceleration of the frequency-domain acceleration search (FDAS) module, part of SKA...
Random projections have recently emerged as a powerful technique for large scale dimensionality reduction in machine learning applications. Crucially, the projection can be obtained from sparse probability distributions, enabling hardware implementations with little overhead. In this paper, we describe a Field-Programmable Gate Array (FPGA) implementation alongside a kernel adaptive filter (KAF) that...
Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs showing significant improvements in their classification and training times. With these improvements, many frameworks have become available for implementing CNNs...
For many intensive computing tasks, simultaneous data access into multi-dimensional data arrays is highly restricted by its data mapping strategy and memory port constraint. As such, to increase memory accessing bandwidth, innovative memory partitioning and mapping algorithms have been proposed to simultaneously access multiple memory blocks through physically distributing data elements in the same...
High-end FPGAs are widely adopted as hardware accelerators, due to their power efficiency, flexibility, and high-performance computing ability. They are, therefore, extremely useful devices for a project with challenges and constraints such as the Square Kilometre Array (SKA). However, the traditional design methods require expert hardware knowledge and long development times for each of the SKA's...
In ultrasound image analysis, speckle tracking methods are widely applied to study the elasticity of body tissue. However, “feature-motion decorrelation” still remains as a challenge for speckle tracking methods. Recently, a coupled filtering method was proposed to accurately estimate strain values when the tissue deformation is large. The major drawback of the new method is its high computational...
This paper presents a fast and cycle accurate simulation environment for early power-performance analysis of multi-threaded applications targeted to symmetric multiprocessing embedded architectures. Our simulation environment leverages the hybrid prototyping technique, where a lightweight emulation kernel performs logical simulation of multiple identical cores on top of a single physical instance...
We evaluate the power and performance of the Rodinia benchmark suite using the Altera SDK for OpenCL targeting a Stratix V FPGA against a modern CPU and GPU. We study multiple OpenCL kernels per benchmark, ranging from direct ports of the original GPU implementations to loop-pipelined kernels specifically optimized for FPGAs. Based on our results, we find that even though OpenCL is functionally portable...
In an effort to offset the rapidly increasing data volume processed by large data centers today, their architects have increasingly been exploring unconventional architectures like FPGAs. Large-scale RC systems like Novo-G# show promise for both big-data processing and HPC, but are limited by a lengthy and difficult design process. In this paper we present a mixed MPI/OpenCL framework that enables...
Availability of OpenCL for FPGAs has raised new questions about the efficiency of massive thread-level parallelism on FPGAs. The general trend is toward creating deep pipelining and in-order execution of many OpenCL threads across a shared data-path. While this can be a very effective approach for regular kernels, its efficiency significantly diminishes for irregular kernels with runtime-dependent...
OpenCL is designed as a parallel programming framework to support heterogeneous computing platforms. The implicit or explicit parallelism in OpenCL kernel code enables efficient FPGA implementation from a high-level programming abstraction. However, FPGA architecture is completely different from GPU architecture, for which OpenCL is widely used. Tuning OpenCL codes to achieve high performance on FPGAs...
This article proposes a modification of the standard Linux scheduler for a support of a reconfigurable heterogeneous multiprocessor system. The standard Linux scheduler is limited to a homogeneous multiprocessor system only. The addition of the processing core with a different feature requires modification of a decision algorithm of the scheduler as a heterogeneous task cannot be executed on any processing...
Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained...
FPGA, or Field Programmable Gate Array, has been widely used for several applications such as digital signal and image processing, video processing, software-defined radio, radar processing, medical imaging and so on. Currently, with the significance growth of parallel computing and cloud computing application, FPGA provides another solution for high performance computing instead of CPU or GPGPU due...
This paper uses the Altera SDK for OpenCL (AOCL) High-Level Synthesis (HLS) tool to accelerate the computation of the SHA-1 hash function. Using FPGAs to increase throughput of this algorithm has been a popular topic in research. The work done thus far, focuses on HDL based design methodologies. The goal of this paper is to determine if the HLS implementation can compare in terms of speed to the HDL...
Since the new technologies like big data and cloud computing require tremendous amount of transactions between processors and memory, researches on a new memory system called Processing in Memory (PIM) architecture has been suggested as a solution for those memory intensive applications. To make software utilize the new architecture, a development environment with tool chain and debug infrastructures...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.