The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection...
Advanced cellular and wireless standards are rapidly expanding their instantaneous RF bandwidth requirements, and with operating frequencies moving into the mmWave spectrum, channels of 1 GHz and wider become increasingly likely. Furthermore, carrier aggregation and MIMO systems require multiple wideband channels, placing even higher demands on the system. No longer satisfied with short bursts of...
In this paper, compact memory strategies for partially parallel Quasi-cyclic LDPC (QC-LDPC) decoder architecture are proposed. By compacting several adjacent rows hard decisions and extrinsic messages into one memory entry, which not only reduces the number of memory banks for hard decisions, but also facilitates multiple data accesses per clock cycle, the throughput of the decoder is increased. We...
The Keyed-Hash Message Authentication Codes(HMAC) is a useful mechanism for message authentication. In this paper, a high-performance HMAC/SHA-3 processor which can generate HMAC message digest and hash message digest is presented. Not only the standard length (224,256,384,512) of the message digest can be generated, but also a length of 64-bit message digest. Due to the application of new generation...
This paper proposes a high throughput architecture for AES encryption/decryption targeting on the recent FPGAs with 6-input LUTs. Unlike previous works which share multiplicative inverse logics to realize SubBytes and InvSubBytes, the proposed architecture directly employs the look-up-table based Sbox for both SubBytes and InvSubBytes. Efficient reordering and merging techniques are applied to achieve...
Current networks are changing very fast. Network administrators need more flexible and powerful tools to be able to support new protocols or services very fast. The P4 language provides new level of abstraction for flexible packet processing. Therefore, we have designed new architecture for memory efficient mapping of P4 match/action tables to FPGA. The architecture is based on DCFL algorithm and...
The interconnect is the Achilles heel of FPGAs. It currently dominates the delay and leads to high power consumption. It is thus, imperative to take it into account when designing complex FPGA systems. In this work, we propose a learning-based method for data-flow systems build out of multiple individual components directly connected and find a set of optimal configurations with unique area vs. throughput...
Many CPU design houses have added dedicated support for cryptography in recent processor generations, including Intel, IBM, and ARM. While adding accelerators and/or dedicated instructions boosts performance on cryptography, we are investigating a different approach that is not adding extra silicon area: We study to replace the hardened NEON SIMD unit of an ARM Cortex-A9 with an identical sized FPGA...
Stream join is a fundamental and computationally expensive data mining operation for relating information from different data streams. This paper presents two FPGA-based architectures that accelerate stream join processing. The proposed hardware-based systems were implemented on a multi-FPGA hybrid system with high memory bandwidth. The experimental evaluation shows that our proposed systems can outperform...
Thanks to their excellent performances on typical artificial intelligence problems, deep neural networks have drawn a lot of interest lately. However, this comes at the cost of large computational needs and high power consumption. Benefiting from high precision at acceptable hardware cost on these difficult problems is a challenge. To address it, we advocate the use of ternary neural networks (TNN)...
This paper presents a time-delay system which originally has chaotic behavior, yet lost that dynamic due to finite quantization levels of state variable representation. One method to overcome this destructive effect of digitalization is engaging a time-varying delay amount which is studied in this paper. Based on this system, random number generator (RNG) topologies are demonstrated with better throughput...
Model of Turbo-Product Codes decoder architecture and method for construction of Turbo-Product Codes decoder are proposed in the paper. The model describes decoder functioning taking into account limitations of hardware platform and proposes re-use of components in the decoding process. The method provides set of steps for decoder implementation. Field-Programmable Gate Arrays circuits are selected...
Reducing the configuration time of portions of an FPGA at run time is crucial in contemporary FPGA-based accelerators. In this work, we propose a method to increase the throughput for FPGA dynamic partial reconfiguration by using standard IP blocks. The throughput is increased by over-clocking the configuration bitstream circuitry beyond the limits stated in the specifications of these standard blocks...
Laser triangulation applications are commonly used for industrial quality control. Such algorithms require real-time systems often made of a computing unit close to the image sensor through a short and fast link. Choosing a camera with integrated Field Programmable Gate Array (FPGA) as the computing unit can provide high pipeline and parallel computing adapted to process image in real-time. Moreover,...
A high-throughput architecture of the CCSDS 122.0-B-1 image compression standard is proposed. The architecture uses a novel memory organization in order to reduce the total memory operations and the number of the individual memories allowing operation without external memories. The architecture has been implemented on space grade and commercial FPGA Device. It achieves 136 MSamples/sec on space grade...
Hash functions represent a fundamental building block of many network security protocols. The SHA-3 hashing algorithm is the most recently developed hash function, and the most secure. Implementation of the SHA-3 hashing algorithm in Hardware Description Language (HDL) is time demanding and tedious to debug. On the other hand, High-Level Synthesis (HLS) tools offer potential solutions to the hardware...
Network security and monitoring devices use packet classification to match packet header fields in a set of rules. Many hardware architectures have been designed to accelerate packet classification and achieve wire-speed throughput for 100 Gbps networks. The architectures are designed for high throughput even for the shortest packets. However, FPGA SoC and Intel Xeon with FPGA have limited resources...
In this paper, an FPGA-based implementation of Frequent Items Counting is proposed. The architecture deploys the equality comparator matrix for comparing the input items with themselves to count them instantly within a single operating clock. The proposed architecture is applied to the case of the 8-bit item. That means 256 different types of items in total. The system is built and verified on the...
In order to improve the throughput of error correction decoding for the high-performance solid-state drives (SSDs), a semi-parallel low-density parity-check (LDPC) decoding architecture is proposed in this paper. The circuit of the LDPC decoder which can be dynamically configured with bit rate and code length is implemented using the scheduling control flow mode of single instruction multiple data...
A number of critical design decisions, such as network topology, buffer sizes, flow control mechanism and so on so forth, have to be evaluated in any NoC the design. Designs and verifications of NoCs are based on either software simulations, which are extremely slow and inaccurate for complex models, or hardware emulations using low/mid-class FPGAs, where the scalability of the NoC system is intensively...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.