The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Accelerating sorting using dedicated hardware to fully utilize the memory bandwidth for Big Data applications has gained much interest in the research community. Recently, parallel sorting networks have been widely employed in hardware implementations due to their high data parallelism and low control overhead. In this paper, we propose a systematic methodology for mapping large-scale bitonic sorting...
Stream join is a fundamental and computationally expensive data mining operation for relating information from different data streams. This paper presents two FPGA-based architectures that accelerate stream join processing. The proposed hardware-based systems were implemented on a multi-FPGA hybrid system with high memory bandwidth. The experimental evaluation shows that our proposed systems can outperform...
Thanks to their excellent performances on typical artificial intelligence problems, deep neural networks have drawn a lot of interest lately. However, this comes at the cost of large computational needs and high power consumption. Benefiting from high precision at acceptable hardware cost on these difficult problems is a challenge. To address it, we advocate the use of ternary neural networks (TNN)...
The ever changing nature of network technology requires a flexible platform that can change as the technology evolves. In this work, a complete networking switch designed in OpenCL is presented, identifying several high-level constructs that form the building blocks of any network application targeting FPGAs. These include the notion of an on-chip global memory and kernels constantly processing data...
Text analytics has become increasingly important in the past few years because of the substantial growth in the amount of research, business, and government needs. An efficient text analytics system is likely to require high-powered regular expression matching (REGEX), as REGEX operations dominate the whole execution time. Some approaches have exploited the parallelism of graphic processing units...
This paper presents P5, a programmable packet parser with packet-level parallel processing for FPGA-based switches. P5 overcomes both limitations. First, P5 has the programmability of dynamically updating parsing algorithms at run-time. Second, P5 exploits packet-level parallelism in the bottleneck of parsing pipeline to compensate FPGA’s low clock frequency, and reduces resource consumption through...
Higher throughput is always desired in real time image processing applications. There are many ways to achieve higher throughput. However, if we have additional resources and memory bandwidth available, parallelism can be applied to achieve it. In this work, we have presented two image scanning methods that carry out parallelism to double the throughput of any architecture. Partitioned image scanning...
The k-means clustering is one of the widely used algorithms in Data Mining and Machine Learning domains due to the simplicity, efficiency and scalability involved. The algorithm allocates N data-points or samples to k-clusters employing the minimum distances from respective cluster centroids. Distance calculation is intrinsically a computationally intensive task which is usually accelerated by using...
Data acquisition (DAQ) is the process of acquire analog signals from different types of sources and further process the acquired signals through personal computer (PC) in digital form. Compared to traditional measurement system, PC-based DAQ system provides a more flexible and cost-effective measurement solution to the industry and utilizes the efficiency, processing power and connectivity capabilities...
This paper describes the implementation of a high throughput FFTs implemented on FPGAs, using a modified version of the Radix 2N architecture. The implementation uses a synthesis method which supports “super-sampling” to provide very high throughput. Special vector structures in the tools and hardware architecture are supported where complex vectors form the input on each clock cycle, and multiple...
In recent years, studies of DPI have been carried out actively. HTTP packets, which are a kind of DPI target, include GZIP compressed packets, and multi-streamed GZIP compressed HTTP cannot be analyzed directly on routers. Moreover, wire-rate processing is required to achieve on-router analysis. In this paper, HTTP decompressing architecture on routers supporting 40Gbps network is considered, and...
With the exponential growth of data size, data storage and analysis have been exposed to more challenges due to the lack of disk capacity and the limited network bandwidth. Data compression technique provides a good solution to mitigate these effects. In this paper, we propose a self-aware data compression system on FPGA for typical data warehousing, such as Hive, with column stored data and multi-threading...
Sorting is a key kernel in numerous big data application including database operations, graphs and text analytics. Due to low control overhead, parallel bitonic sorting networks are usually employed for hardware implementations to accelerate sorting. Although a typical implementation of merge sort network can lead to low latency and small memory usage, it suffers from low throughput due to the lack...
Today's applications and services become more dependent on fast wireless communication, for the upcoming years data-rate demands of 100Gbit/s can be easily expected. However, fulfilling that demand is a task which cannot simply be solved by upscaling existing technologies. While most of the research tackles the challenges regarding the transmission technology from the physical layer up to base-band...
The paper is devoted to design of the digital components for safety-related instrumentation and control systems using the modern CAD tools. Traditionally, the digital components are built with matrix parallelism that reduces fault tolerance of circuits and safety of systems in their checkability. Circuits with bitwise pipeline data processing have advantage in checkability, but are considered as less...
Image processing algorithms which only work on a local neighbourhood are nearly used in every image processing application. Very often several iterations are performed on a fixed neighbourhood which leads to the description of stencil codes. A promising approach in embedded systems is to use the massively parallel computation power of an FPGA for this kind of algorithms. This not only speeds up processing...
NAND flash-based Solid State Drives (SSDs) have been widely deployed in data centers of cloud computing due to their high performance compared with hard disks, while the limited lifespan of flash memory makes SSDs not very suitable for write-intensive applications. Deduplication is an effective method used to reduce the write traffic of applications thus can be used to extend the lifespan of SSDs...
In this paper, we have investigated pipeline and parallel processing architectures of finite impulse response (FIR) filter for efficient field programmable gate array (FPGA) implementation. Our simulation results shows that parallel processing architecture is more efficient as compared to pipeline architecture. Further, it is shown that fast FIR architecture is most suitable as compared to conventional...
Throughput is a key performance metric for streaming FFT architectures. However, increasing spatial parallelism to improve throughput introduces complex routing, thus resulting in high power consumption. In this paper, we propose a high throughput energy efficient parallel FFT architecture based on Cooley-Tukey algorithm. Multiple pipeline FFT processors using time-multiplexing are utilized to perform...
Cryptography algorithms are ranked by their speed in encrypting/decrypting data and their robustness to withstand attacks. Real-time processing of data encryption/decryption is essential in network based applications to keep pace with the input data inhalation rate. The encryption/decryption steps are computationally intensive and exhibit high degree of parallelism. Field programmable gate arrays...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.