The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive loop iterations. While existing HLS pipelining techniques obtain good performance with low complexity for regular loop nests, they provide inadequate support for effectively synthesizing irregular loop nests. For loop nests with dynamic-bound inner loops,...
In this paper, compact memory strategies for partially parallel Quasi-cyclic LDPC (QC-LDPC) decoder architecture are proposed. By compacting several adjacent rows hard decisions and extrinsic messages into one memory entry, which not only reduces the number of memory banks for hard decisions, but also facilitates multiple data accesses per clock cycle, the throughput of the decoder is increased. We...
Stream join is a fundamental and computationally expensive data mining operation for relating information from different data streams. This paper presents two FPGA-based architectures that accelerate stream join processing. The proposed hardware-based systems were implemented on a multi-FPGA hybrid system with high memory bandwidth. The experimental evaluation shows that our proposed systems can outperform...
Model of Turbo-Product Codes decoder architecture and method for construction of Turbo-Product Codes decoder are proposed in the paper. The model describes decoder functioning taking into account limitations of hardware platform and proposes re-use of components in the decoding process. The method provides set of steps for decoder implementation. Field-Programmable Gate Arrays circuits are selected...
Laser triangulation applications are commonly used for industrial quality control. Such algorithms require real-time systems often made of a computing unit close to the image sensor through a short and fast link. Choosing a camera with integrated Field Programmable Gate Array (FPGA) as the computing unit can provide high pipeline and parallel computing adapted to process image in real-time. Moreover,...
In this paper, an FPGA-based implementation of Frequent Items Counting is proposed. The architecture deploys the equality comparator matrix for comparing the input items with themselves to count them instantly within a single operating clock. The proposed architecture is applied to the case of the 8-bit item. That means 256 different types of items in total. The system is built and verified on the...
The need for efficient Finite Impulse Response (FIR) filters in high-speed applications targets Field Programmable Gate Arrays (FPGAs) as an effective and flexible platform for digital implementation. Although FIR filter offer advantages like linear phase characteristic, no feedback loops and good system stability, its convolution nature poises a challenge in parallelization due to data dependency...
In this paper, we push the limits in maximizing the throughput of side-channel-protected AES-GCM implementations on an FPGA. We present a fully unrolled and pipelined architecture that uses a Boolean masking countermeasure (specifically, threshold implementation) for first-order DPA resistance. Using a high-end Virtex-7 device, we obtain a throughput of 15.24 Gbit/s. Since masked implementations require...
Field Programmable Gate Arrays (FPGAs) excel at the implementation of local operators in terms of throughput per energy since the off-chip communication can be reduced with an application-specific on-chip memory configuration. Furthermore, data-level parallelism can efficiently be exploited through socalled loop coarsening, which processes multiple horizontal pixels simultaneously. Moreover, existing...
In this paper we present a complete, open-source GZIP compressor implementation for FPGA based on a systolic array architecture. GZIP is one of the most utilized compression algorithms. Besides the usual use-case of compression for data storage, distributed computing systems such as Hadoop utilize compression to reduce the amount of data which is transferred between computing nodes in a cluster. However,...
Text analytics has become increasingly important in the past few years because of the substantial growth in the amount of research, business, and government needs. An efficient text analytics system is likely to require high-powered regular expression matching (REGEX), as REGEX operations dominate the whole execution time. Some approaches have exploited the parallelism of graphic processing units...
Cipher-based message authentication code, CMAC, is a NIST approved standard for checking message integrity and authentication. This work presents a low-latency AES architecture for CMAC. The architecture uses intensive parallel processing per round and takes advantage of the BRAM present in modern FPGA. Experimental results show that for typical IoT application, the proposed architecture has a latency...
Markov Chain Monte Carlo (MCMC) algorithms are used to obtain samples from any target probability distribution and are widely used in stochastic processing techniques. Stochastic processing techniques such as machine learning and image processing need to compute large amounts of data in real-time, thus high throughput MCMC samplers are of utmost importance. Parallel Tempering (PT) MCMC has proven...
Hierarchical temporal memory (HTM) is the model of the neocortex functionality, developed by Numenta, Inc. The level of implementation does cover only the subset of actual neocortex layers functionality, but, however, is sufficient to be useful in different domain areas e.g. for a novelty or anomaly detection. Numenta provides their implementation of the HTM for commercial or research purposes as...
Multichannel active noise control (MCANC) systems are commonly used in acoustic noise or vibration control, such as large-dimension ventilation ducts, open windows and mechanical structures. However, its computational load far exceeds the capabilities of digital signal processors (DSPs) and microcontrollers. Even the field programmable gate array (FPGA) cannot straightforwardly cope with the exponential...
In recent years, connected component analysis (CCA) has become one of the vital image/video processing algorithms due to its wide-range applicability in the field of computer vision. Numerous applications such as pattern recognition, object detection and image segmentation involve connected component analysis. In the context of camera-based inspection systems, CCA plays an important role for quality...
Higher throughput is always desired in real time image processing applications. There are many ways to achieve higher throughput. However, if we have additional resources and memory bandwidth available, parallelism can be applied to achieve it. In this work, we have presented two image scanning methods that carry out parallelism to double the throughput of any architecture. Partitioned image scanning...
Current and future applications impose high demands on software-defined radio (SDR) platforms in terms of latency, reliability, and flexibility. This paper presents a heterogeneous SDR MPSoC with a hexagonal network-on-chip to address these issues. It features four data processing modules and a baseband processing engine for iterative multiple-input multiple-output (MIMO) receiving. Integrated memory...
The k-means clustering is one of the widely used algorithms in Data Mining and Machine Learning domains due to the simplicity, efficiency and scalability involved. The algorithm allocates N data-points or samples to k-clusters employing the minimum distances from respective cluster centroids. Distance calculation is intrinsically a computationally intensive task which is usually accelerated by using...
This paper presents initial results for a novel 128-antenna massive Multiple-Input, Multiple- Output (MIMO) testbed developed through Bristol Is Open in collaboration with National Instruments and Lund University. We believe that the results presented here validate the adoption of massive MIMO as a key enabling technology for 5G and pave the way for further pragmatic research by the massive MIMO community...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.