The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Using an FPGA as a hardware accelerator has been prevalent, to speed up compute intensive workloads. However, employing an accelerator in virtualized environment enhances complexity, because accessing the accelerator from virtual machines has significant overhead and sharing it needs some considerations. We have implemented adequate infrastructure to share an FPGA-based accelerator between multiple...
The Network Function Virtualization (NFV) paradigm promises to make networks more scalable and flexible by decoupling the network functions (NFs) from dedicated and vendor-specific hardware. However, network and compute intensive NFs may be difficult to virtualize without performance degradation. In this context, Field-Programmable Gate Arrays (FPGAs) have been shown to be a good option for hardware...
Visible light communication (VLC) has won much attention in recent years. In this work, an experimental visible light communication system of its' Media Access Control (MAC) layer on the digital signal process and a simple method to avoid one packet dual transferred between two or more Access Point (AP) is presented. The work is implemented in FPGA (Field Programmable Gate Array), which are based...
Throughput, area and power optimized designs for the advanced encryption standard algorithm are proposed in this paper. The presented designs are suitable for the encrypt-only AES-128 algorithm. Both designs integrate pipelining and iterative architectures in one design. This is achieved through applying the concept of partial loop unrolling where iterations and multistage pipelining are used to optimize...
Convolutional neural networks (CNNs) are revolutionizing a variety of machine learning tasks, but they present significant computational challenges. Recently, FPGA-based accelerators have been proposed to improve the speed and efficiency of CNNs. Current approaches construct an accelerator optimized to maximize the overall throughput of iteratively computing the CNN layers. However, this approach...
We present a computer-aided design (CAD) tool that automatically connects an FPGA application using an embedded network-on-chip (NoC). After discussing the CAD flow steps, we delve into the details of implementing transaction communication using our CAD tool. This request-reply type of communication requires special consideration on FPGAs, for example: low round-trip latency, fair arbitration and...
Key-value stores (KVS) become critical in many applications because of the data explosion recently. There is a strong demand to improve the throughput and reduce the latency for KVS. FPGA-based parallel architecture can bring excellent performance and power efficiency. Cuckoo hashing has proven to be an efficient approach to implement KVS with good memory utilization and constant worst case access...
Many FPGA-based accelerators are constrained by the available resources and multi-FPGA solutions can be necessary for building more capable systems. Available PCIe solutions provide only FPGA-to-Host communication. In this paper we present JetStream, an open-source1 modular PCIe 3 library, supporting not only fast FPGA-to-Host communication, but also allowing direct FPGA-to-FPGA communication which...
Sharing multi-cycle hardware blocks like the DSP48E1 primitive in Xilinx FPGAs can result in significant resource savings, but complicates scheduling. For high-throughput, DSP blocks must be pipelined, which results in a high initiation interval (II) for resource shared implementations. In this paper, we propose a resource reduction technique that minimises DSP block usage while also offering improved...
We explore the possibility of using shift register lookup tables (SRLs) for the implementation of Keccak on Xilinx FPGAs. The approach originates from the observation that the ρ step in combination with the state storage can be implemented as a collection of shift registers. This way, we achieve a slice-wise implementation using 25 shift registers of various lengths, resulting in 75 32-bit and 6 16-bit...
One of the most important topics of today is a packet processing in data centers with respect to the power consumption and efficient utilization of computational resources. The ARM architecture has proved to be an energy efficient computational system. Together with an integrated FPGA on a single die, it offers potentially a high performance with respect to the power consumption. DPDK - a set of libraries...
Data acquisition (DAQ) is the process of acquire analog signals from different types of sources and further process the acquired signals through personal computer (PC) in digital form. Compared to traditional measurement system, PC-based DAQ system provides a more flexible and cost-effective measurement solution to the industry and utilizes the efficiency, processing power and connectivity capabilities...
Erasure coding, Reed-Solomon coding in particular, is a key technique to deal with failures in scale-out storage systems. However, due to the algorithmic complexity, the performance overhead of erasure coding can become a significant bottleneck in storage systems attempting to meet service level agreements (SLAs). Previous work has mainly leveraged SIMD (single-instruction multiple-data) instruction...
This paper proposes a FPGA based hardware architecture for quadruple precision (QP) division arithmetic which can also process a single, a double and a double-extended precision (SP, DP, DPE) computations. The mantissa division employs a series expansion methodology of division, integrated with a wide integer multiplier further optimized for FPGA implementations facilitating the built-in DSP blocks...
This paper describes the architecture and implementation of a high performance QR decomposition IEEE754 single precision floating point core, using a modified Gram-Schmidt algorithm. Using Intel's new floating point Arria 10 FPGAs, synthesis is used to generate column high functional units, giving O(n2) processing times. The modified Gram-Schmidt algorithm is expressed in a different order to combine...
This paper describes the implementation of a high throughput FFTs implemented on FPGAs, using a modified version of the Radix 2N architecture. The implementation uses a synthesis method which supports “super-sampling” to provide very high throughput. Special vector structures in the tools and hardware architecture are supported where complex vectors form the input on each clock cycle, and multiple...
In this project, a hardware implementation of the AES-256 encryption and decryption algorithm is proposed. The AES cryptography algorithm can be used to encryption and decryption blocks of 128 bits and is capable of using cipher keys of 256 bits. Feature of the proposed pipeline design is depending on the round keys, which are consumed different round of encryption, are generated in parallel way with...
Field-programmable gate arrays (FPGAs) are used in various systems that use reconfigurable function. Conventional FPGAs have been developed by a transistor-level description for minimizing routing delay. Although FPGAs developed by the register transfer level (RTL) design methodology provide various benefits to the designers of a system-on-a-chip (SoC), they have not been realized. Therefore, the...
The design of pipelined Fast Fourier transform (PFFT) in modern communication systems provides an efficient way for computation of FFT with better area utilizing hardware architecture. Previously, the radix-22 had been used only for single path delay feedback architectures. Later with many types of research works the radix 22 was extended to multi-path delay commutator (MDC) architectures. This paper...
Classification is one of the core tasks in machine learning data mining. One of several models of classification are classification rules, which use a set of if-then rules to describe a classification model. In this paper we present a set of FPGA-based compute kernels for accelerating classification rule induction. The kernels can be combined to perform specific procedures in rule induction process,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.