The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We explore the possibility of using shift register lookup tables (SRLs) for the implementation of Keccak on Xilinx FPGAs. The approach originates from the observation that the ρ step in combination with the state storage can be implemented as a collection of shift registers. This way, we achieve a slice-wise implementation using 25 shift registers of various lengths, resulting in 75 32-bit and 6 16-bit...
One of the most important topics of today is a packet processing in data centers with respect to the power consumption and efficient utilization of computational resources. The ARM architecture has proved to be an energy efficient computational system. Together with an integrated FPGA on a single die, it offers potentially a high performance with respect to the power consumption. DPDK - a set of libraries...
Data acquisition (DAQ) is the process of acquire analog signals from different types of sources and further process the acquired signals through personal computer (PC) in digital form. Compared to traditional measurement system, PC-based DAQ system provides a more flexible and cost-effective measurement solution to the industry and utilizes the efficiency, processing power and connectivity capabilities...
Erasure coding, Reed-Solomon coding in particular, is a key technique to deal with failures in scale-out storage systems. However, due to the algorithmic complexity, the performance overhead of erasure coding can become a significant bottleneck in storage systems attempting to meet service level agreements (SLAs). Previous work has mainly leveraged SIMD (single-instruction multiple-data) instruction...
This paper proposes a FPGA based hardware architecture for quadruple precision (QP) division arithmetic which can also process a single, a double and a double-extended precision (SP, DP, DPE) computations. The mantissa division employs a series expansion methodology of division, integrated with a wide integer multiplier further optimized for FPGA implementations facilitating the built-in DSP blocks...
This paper describes the architecture and implementation of a high performance QR decomposition IEEE754 single precision floating point core, using a modified Gram-Schmidt algorithm. Using Intel's new floating point Arria 10 FPGAs, synthesis is used to generate column high functional units, giving O(n2) processing times. The modified Gram-Schmidt algorithm is expressed in a different order to combine...
This paper describes the implementation of a high throughput FFTs implemented on FPGAs, using a modified version of the Radix 2N architecture. The implementation uses a synthesis method which supports “super-sampling” to provide very high throughput. Special vector structures in the tools and hardware architecture are supported where complex vectors form the input on each clock cycle, and multiple...
In this project, a hardware implementation of the AES-256 encryption and decryption algorithm is proposed. The AES cryptography algorithm can be used to encryption and decryption blocks of 128 bits and is capable of using cipher keys of 256 bits. Feature of the proposed pipeline design is depending on the round keys, which are consumed different round of encryption, are generated in parallel way with...
By leveraging the uneven distribution of traffic among network flows, the authors improve the query throughput of Cuckoo hashing. They achieve this by placing the most frequently used items in the table that's accessed first during the Cuckoo query operation. Their scheme is named Cuckoo cache, as it's conceptually similar to a cache but implemented inside the Cuckoo hash with little additional cost...
Field-programmable gate arrays (FPGAs) are used in various systems that use reconfigurable function. Conventional FPGAs have been developed by a transistor-level description for minimizing routing delay. Although FPGAs developed by the register transfer level (RTL) design methodology provide various benefits to the designers of a system-on-a-chip (SoC), they have not been realized. Therefore, the...
The design of pipelined Fast Fourier transform (PFFT) in modern communication systems provides an efficient way for computation of FFT with better area utilizing hardware architecture. Previously, the radix-22 had been used only for single path delay feedback architectures. Later with many types of research works the radix 22 was extended to multi-path delay commutator (MDC) architectures. This paper...
Classification is one of the core tasks in machine learning data mining. One of several models of classification are classification rules, which use a set of if-then rules to describe a classification model. In this paper we present a set of FPGA-based compute kernels for accelerating classification rule induction. The kernels can be combined to perform specific procedures in rule induction process,...
We propose a novel, near-optimal data detection algorithm and a corresponding FPGA design for large multiple-input multiple-output (MIMO) wireless systems. Our algorithm, referred to as TASER (short for triangular approximate semidefinite relaxation), relaxes the maximum-likelihood (ML) detection problem to a semidefinite program and solves a non-convex approximation using a preconditioned forward-backward...
There is increasing interest for aerial vehicles to perform image processing tasks (i.e. object recognition and detection) in real-time. Such systems systems should have minimal data throughput, low computational complexity, and low-power. Traditional frame-based digital cameras are not ideal for meeting such specifications. More recent cameras, inspired by biology, drastically reduce data throughput...
In this paper, we propose a novel design for large-scale graph processing on FPGA. Our design uses large external memory for storing massive graph data and FPGA for acceleration, and leverages edge-centric computing principles. We propose a data layout which optimizes the external memory performance and leads to an efficient memory activation schedule to reduce on-chip memory power consumption. Further,...
Resource sharing attempts to minimise usage of hardware blocks by mapping multiple operations onto same block at the cost of an increase in schedule length and initiation interval (II). Sharing multi-cycle high-throughput DSP blocks using traditional approaches results in significantly high II, determined by structure of dataflow graph of the design, thus limiting achievable throughput. We have developed...
In this paper, we propose a novel approximate adder structure for LUT-based FPGA technology. Compared with a full featured accurate carry-ripple adder, the longest path is significantly shortened which enables the clocking with an increased clock frequency. By using the proposed adder structure, the throughput of an FPGA-based implementation can be significantly increased. On the other hand, the resulting...
Lenstra-Lenstra-Lovász (LLL) algorithm is a common technique for lattice reduction (LR) aided multiple-input multiple-output (MIMO) detectors. This paper presents the first VLSI implementation of a recently published Incremental fixed-complexity LLL algorithm (Incremental fcLLL) with fewer iterations than other existing fcLLL algorithms. We propose a modified Incremental fcLLL algorithm with simplified...
Restricted Boltzmann Machines (RBMs) are widely used in modern machine learning tasks. Existing implementations are limited in network size and training throughput by available DSP resources. In this work we propose a new algorithm and architecture for FPGAs called dropout-RBM (dRBM) system. Compared to the state-of-art design methods on the same FPGA, dRBM with a dropout rate 0.5 doubles the maximum...
Various optimized coordinate rotation digital computer (CORDIC) designs have been proposed to date. Nonetheless, in the presence of natural faults, such architectures could lead to erroneous outputs. In this paper, we propose error detection schemes for CORDIC architectures used vastly in applications such as complex number multiplication, and singular value decomposition for signal and image processing...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.