The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present a click-element-based asynchronous loop structure for control path of asynchronous micro-control unit (MCU). The loop, which has one-stage control circuit instead of cascade circuits, can be triggered by only one trigger signal and stopped by a preset number. To verify the loop structure, we design an asynchronous MCU simulated in FPGA. The experimental results show that the MCU can be...
Implementing elliptic curve point multiplication (ECPM) based on residue number system (RNS) can efficiently use FPGA resources. In this paper, we propose a modular reduction method, where a kind of RNS pair is selected to achieve fast reduction. Our reduction method mainly needs several parallel additions while the reduction unit of previous designs require two multiplications which are computed...
In this paper, we describe an FPGA system for the real-time processing of Poisson image Editing. Poisson Image Editing is a powerful method to overlay an image on another image seamlessly. In this method, however, a simple equation is repeatedly applied to each pixel, and this repetition makes its computational complexity very high. In our system, a very deep pipeline is used to apply the equation...
Security features of modern (SoC) FPGAs permit to protect the confidentiality of hard- and software IP when the devices are powered off as well as to validate the authenticity of IP when being loaded at startup. However, these approaches are insufficient since attackers with physical access can also perform attacks during runtime, demanding for additional security measures. In particular, RAM used...
Large number multiplication has always been an essential operation in cryptographic algorithms. In this paper, we propose Broken-Karatsuba multiplication by applying the non-least-positive form to represent large numbers and dig the parallelism hidden in conventional Karatsuba multiplication. Further, we modify Montgomery modular multiplication algorithm with Broken-Karatsuba multiplication to make...
A motion planning algorithm aims to calculate one obstacle-free trajectory which meets the dynamical constraints of a vehicle and leads the vehicle from the start state to the target state. RRT∗ (RRT star) is one sampling-based algorithm which is widely used in many applications because of its speed in quickly finding a trajectory. In contrast with basic RRT (Rapidly-exploring Random Trees) algorithm,...
Undergraduate students rapidly implement a partially-reconfigured, real-time video processor on the Xilinx PYNQ board. The video processor performs various real-time operations including Sobel edge detection, embossing, averaging, an interactive Pong game, etc., using a separate partially-reconfigurable bit-stream for each distinct function. Selection of image-processing functions is accomplished...
Compared to classical HDL designs, generating FPGA with high-level synthesis from an OpenCL specification promises easier exploration of different design alternatives and, through ready-to-use infrastructure and common abstractions for host and memory interfaces, easier portability between different FPGA families. In this work, we evaluate the extent of this promise. To this end, we present a parameterized...
Stencil computations represent a highly recurrent class of algorithms in various high performance computing scenarios. The Streaming Stencil Time-step (SST) architecture is a recent implementation of stencil computations on Field Programmable Gate Array (FPGA). In this paper, we propose an automated framework for SST-based architectures capable of achieving the maximum performance level for a given...
Recently, there has been an increased focus on integration of reconfigurable fabric with modern processors. However, existing soft-processors are optimized to leverage older FPGA fabrics, focus primarily on resource minimization and have fixed-pipeline designs that limit the scope for tightly integrated hardware accelerators. In this work, we present Taiga: a RISC-V, 32-bit, soft-processor architecture...
This paper presents the first area-optimized Montgomery modular multiplication module on low-power reconfigurable IGLOO® 2 FPGAs, from Microsemi. In order to obtain a good response time with few resources, the FPGA pipelined Math blocks and the embedded memory blocks are fully leveraged. As a result, 256-bit modular multiplications can be done in 2.33 μs, at a cost of 505 LUT4 cells, 257 Flip Flops,...
FPGAs have emerged as a cost-effective accelerator alternative in clouds and clusters. Programmability remains a challenge, however, with OpenCL being generally recognized as a likely part of the solution. In this work we seek to advance the use of OpenCL for HPC on FPGAs in two ways. The first is by examining a core HPC application, Molecular Dynamics. The second is by examining a fundamental design...
Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, architectures with application-specific accelerators have gained traction in research and industry. While this is a very promising direction, hard-wired accelerators fall short when too many applications need to be supported or flexibility is...
In fields like embedded vision, where algorithms are computationally expensive, hardware accelerators play a major role in high throughput applications. These accelerators could be implemented as hardwired IP cores or Application Specific Instruction-set Processors (ASIPs). While hardwired solutions often provide the best possible performance, they are less flexible then ASIP implementation. In this...
Network security and monitoring devices use packet classification to match packet header fields in a set of rules. Many hardware architectures have been designed to accelerate packet classification and achieve wire-speed throughput for 100 Gbps networks. The architectures are designed for high throughput even for the shortest packets. However, FPGA SoC and Intel Xeon with FPGA have limited resources...
This paper presents the authors' research work in the fields of embedded real-time softcore systems on FPGAs and specialized optimizing assembly language compiler. With this softcore processor, we are targeting a highly specialized field of applications that require a large floating point precision and other unique characteristics. Therefore, a specialized optimizing assembly language compiler is...
In the past few years we have experienced an extremely rapid growth of modern applications based on deep learning algorithms such as Convolutional Neural Network (CNN), and consequently, an intensification of academic and industrial research focused on the optimization of their imple- mentation. Among the different alternatives that have been ex- plored, FPGAs seems to be one of the most attractive,...
In this paper, three different approaches are considered for FPGA based implementations of the SHA-3 hash functions. While the performance of proposed unfolded and pipelined structures just match the state of the art, the dependencies of the structures which are folded slice-wise allow to further improve the efficiency of the existing state of the art. By solving the intra-round dependencies caused...
Scene flow is a key function of stereo-based environment perception system for mobile robotics and autonomous vehicle. Due to the heavy computing requirement and the limited computing resource, parallelized and embedded algorithms become quite important for the application of the mobile robotics. This paper develops a cross-platform embedded scene flow algorithm by using a coarse-grained software...
As line-speeds and packet losses are sufficient well for most applications, reduction of latency and jitter are gaining in importance. We introduce and discuss the architecture of a novel networking device that provides low-latency switching and routing. It integrates an up-to-date FPGA with a standard ×86-64 processor and targets Time-Sensitive Networking (TSN) and machine-to-machine communication...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.