The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a novel, near-optimal data detection algorithm and a corresponding FPGA design for large multiple-input multiple-output (MIMO) wireless systems. Our algorithm, referred to as TASER (short for triangular approximate semidefinite relaxation), relaxes the maximum-likelihood (ML) detection problem to a semidefinite program and solves a non-convex approximation using a preconditioned forward-backward...
There is increasing interest for aerial vehicles to perform image processing tasks (i.e. object recognition and detection) in real-time. Such systems systems should have minimal data throughput, low computational complexity, and low-power. Traditional frame-based digital cameras are not ideal for meeting such specifications. More recent cameras, inspired by biology, drastically reduce data throughput...
In this paper, we propose a novel design for large-scale graph processing on FPGA. Our design uses large external memory for storing massive graph data and FPGA for acceleration, and leverages edge-centric computing principles. We propose a data layout which optimizes the external memory performance and leads to an efficient memory activation schedule to reduce on-chip memory power consumption. Further,...
Resource sharing attempts to minimise usage of hardware blocks by mapping multiple operations onto same block at the cost of an increase in schedule length and initiation interval (II). Sharing multi-cycle high-throughput DSP blocks using traditional approaches results in significantly high II, determined by structure of dataflow graph of the design, thus limiting achievable throughput. We have developed...
In this paper, we propose a novel approximate adder structure for LUT-based FPGA technology. Compared with a full featured accurate carry-ripple adder, the longest path is significantly shortened which enables the clocking with an increased clock frequency. By using the proposed adder structure, the throughput of an FPGA-based implementation can be significantly increased. On the other hand, the resulting...
Lenstra-Lenstra-Lovász (LLL) algorithm is a common technique for lattice reduction (LR) aided multiple-input multiple-output (MIMO) detectors. This paper presents the first VLSI implementation of a recently published Incremental fixed-complexity LLL algorithm (Incremental fcLLL) with fewer iterations than other existing fcLLL algorithms. We propose a modified Incremental fcLLL algorithm with simplified...
Restricted Boltzmann Machines (RBMs) are widely used in modern machine learning tasks. Existing implementations are limited in network size and training throughput by available DSP resources. In this work we propose a new algorithm and architecture for FPGAs called dropout-RBM (dRBM) system. Compared to the state-of-art design methods on the same FPGA, dRBM with a dropout rate 0.5 doubles the maximum...
Various optimized coordinate rotation digital computer (CORDIC) designs have been proposed to date. Nonetheless, in the presence of natural faults, such architectures could lead to erroneous outputs. In this paper, we propose error detection schemes for CORDIC architectures used vastly in applications such as complex number multiplication, and singular value decomposition for signal and image processing...
Recently accelerating sorting using FPGA has been of growing interest in both industry and academia. However, the supported size of data set is usually small for FPGA-only sorting designs due to limited on-chip memory. In this paper, we propose a design to speed-up large scale sorting using a CPU-FPGA heterogeneous platform. We first optimize a fully-pipelined merge sort based accelerator and employ...
To secure the data stored in large-scale Storage Area Network (SAN) applications, high throughput Advanced Encryption Standard (AES) encryption and decryption are required. However, this solution may take up more hardware resources, which leads to unscalability for future needs. To solve this problem, we develop a high throughput and resource efficient AES encryption/decryption based on FPGA, which...
ICEPOLE is a high-speed, hardware-oriented family of authenticated encryption schemes aimed at high-throughput data processing. It is one of the candidates admitted to the second round of the ongoing CAESAR competition for selecting dedicated authenticated encryption schemes. One goal of the second round is the evaluation of hardware implementations. Although ICEPOLE is designed for high-speed applications,...
DPI technology has been widely deployed in networking intrusion detection system (NIDS) to detect attacks or viruses. State-of-the-art NIDS uses deterministic finite automata (DFA) algorithms to perform regular expression matching for its stable matching speed. However, traditional DFA algorithm's throughput is limited by the input character's width (usually one character per time). Although the multi-stride...
In recent years, studies of DPI have been carried out actively. HTTP packets, which are a kind of DPI target, include GZIP compressed packets, and multi-streamed GZIP compressed HTTP cannot be analyzed directly on routers. Moreover, wire-rate processing is required to achieve on-router analysis. In this paper, HTTP decompressing architecture on routers supporting 40Gbps network is considered, and...
In this paper, we propose a high-throughput pipeline architecture of the stream cipher ZUC which has been included in the security portfolio of 3GPP LTE-Advanced. In the literature, the schema with the highest throughput only implements the working stage of ZUC. The schemas which implement ZUC completely can only achieve a much lower throughput, since a self-feedback loop in the critical path significantly...
Design productivity is a major concern preventing the mainstream adoption of FPGAs. Overlay architectures have emerged as one possible solution to this challenge, offering fast compilation and software-like programmability. However, overlays typically suffer from area and performance overheads due to limited consideration for the underlying FPGA architecture. These overlays have often been of limited...
Implementation of Quasi-Cyclic (QC) Low Density Parity-Check (LDPC) decoder on FPGA devices has shown great interest in both wireless communication, as well as error correction for Flash memories. This paper presents an FPGA flooded LDPC decoder which uses multiple codeword processing for efficient memory utilization. It is based on a partially parallel implementation, which relies on memory blocks...
The paper presents a scalable architecture for fast emulation of Systems-on-Chip. It is implemented on a dedicated modular FPGA-based hardware platform. This verification eco-system presents a new approach to improve efficiency of the verification process through hardware-based acceleration of tests. The system consists of dedicated hardware modules and third-party; easy-to-get evaluation boards to...
RF antenna array beamforming based on electronically steerable wideband phased-array apertures find applications in communications, radar, imaging and microwave sensing. High-bandwidth requirements for wideband RF applications necessitate hundreds of MHz or GHz frame-rates for the digital array processor. A systolic architecture is proposed for the real-time implementation of the 2-D IIR beam filter...
In the modern world of digitization, processing of data in real time requires an increase in the operating speed of a system. The processing more often than not utilizes multiplication which is time consuming and introduces considerable amount of delay. As such, there is a need to reduce this delay and achieve faster real time processing of data. This paper proposes a novel architecture for implementation...
Ensuring network traffic privacy and improved performance is a key factor to provide trustworthy communication for data transmission. In this research work, we investigate a mechanism to integrate our previous works: open TCP/IP core and the cryptosystem on the same Field-Programmable Gate Array (FPGA) chip. Challenges are addressed in this paper regarding the data format of the encrypted message...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.