The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The image processing applications require low power and high speed, the convolution based 1D-DWT is not desirable. In this proposed architecture the modified 5/3 lifting algorithm is realized on FPGA platform with optimizations. The latency and throughput is optimized with the modified algorithm. The architecture is modelled using HDL and implemented on FPGA. The proposal operates at 178MHz and realised...
Significant changes in traffic patterns often indicate network anomalies. Detecting these changes rapidly and accurately is a critical task for network security. Due to the large number of network users and the high throughput requirement of today’s networks, traditional per-item-state techniques are either too expensive when implemented using fast storage devices (such as SRAM) or too slow when implemented...
In this paper we present a model for predicting performance of a distributed, reconfigurable computing cluster using commodity parts, specifically the Digilent OpenSPARC development board, and the SIRC Framework, developed by Microsoft Research. The goal of this work is to assist in determining the feasibility of deploying a similar system for a given problem. This work is aimed a low-budget and introductory...
Nowadays Ethernet/IP based packet forwarding consists of a complex set of lookup schemes. A router/switch may have to support multiple such lookup schemes, depending on the location and specific operation of the device. Manual conversion of lookup schemes into a target architecture is slow and does not ensure an optimal allocation of FPGA resources for best performance. We develop an Integer Linear...
Modern cloud storage requires a high throughput and low latency data protection system, which is usually implemented with an Advanced Encryption Standard (AES) hardware accelerator connected with CPU through PCI Express (PCIe). However, most existing systems cannot simultaneously achieve high throughput and low latency, as they impose conflicting requirements to the block size of packets used in PCIe...
This paper presents the design and implementation of a generic cyclic convolution architecture for imaging applications on field programmable gate array (FPGA). Two main architectures are implemented. A parallel architecture using distributed arithmetic (DA) and a sequential implementation using FPGA digital signal processor (DSP) resources were implemented using VHSIC hardware description language...
Customisable data formats provide an opportunity for exploring trade-offs in accuracy and performance of reconfigurable systems. This paper introduces a novel methodology for mixed-precision comparison, which improves comparison performance by using reduced-precision data paths while maintaining accuracy by using high-precision data paths. Our methodology adopts reduced-precision data-paths for preliminary...
This paper focuses on design and analysis of a Field Programmable Gate Array (FPGA) hardware for Skein's tree hashing mode. Several approaches on how to modify sequential hashing cores, and create scalable control logic in order to provide for high-speed parallel hashing hardware are presented and analyzed. The results are compared to the current sequential designs of Skein, providing a complete analysis...
Matrix operations are required in many complex algorithms in digital, image and video processing applications. The conventional method is usually used to implement matrix multiplications for small matrices. However, with the development of VLSI technology and FPGAs, there is an increasing demand for developing a high speed, low power and low area matrix multiplication system for large matrices. The...
This paper presents a high performance reconfiguration controller enhanced with the use of streaming lossless decompression in its data path. Two reconfiguration controllers are designed, the first is a generic controller that utilises standard concepts such as Direct Memory Access, burst mode transfer of data and interrupts to maximise throughput. This controller is then improved by the inclusion...
In FPGA-based adaptive computing, Inter-Process Communications (IPC) are required to exchange information among hardware processes which time-multiplex the resources in a same reconfigurable region. In this paper, we use pipes for IPC and analyze the performance in terms of throughput, throughput efficiency and latency in switching contexts. We also present two practical implementations using FPGA...
Space communication systems are characterized by the severe limitations to the on-board computational power and the tight constraints of received signal strengths. Also, these systems observe degradation in signals caused by large propagation latencies, extreme distances traveled, as well as data corruption causing high biterror rates. LDPC codes provide powerful error correction capability where...
Modern embedded devices are increasingly becoming multiprocessor with the need to support a large number of applications to satisfy the demands of users. Due to a huge number of possible combinations of these multiple applications, it becomes a challenge to predict their performance. This becomes even more important when applications may be dynamically started and stopped in the system. Since modern...
We present a Pareto efficient design method for multi-dimensional optimization of run-time reconfigurable streaming applications on CPU/FPGA platforms, which automatically allocates applications with optimized buffer requirement and software/hardware implementation cost. At the same time, application performance is guaranteed with sustainable throughput during run-time reconfigurations. As the main...
In conventional static implementations for correlated streaming applications, computing resources may be in-efficiently utilized since multiple stream processors may supply their sub-results at asynchronous rates for result correlation or synchronization. To enhance the resource utilization efficiency, we analyze multi-streaming models and implement an adaptive architecture based on FPGA Partial Reconfiguration...
One of the obvious advantages of FPGA-based reconfigurable computing is customizability of a tradeoff point between performance and hardware costs. However, this tradeoff has rarely been discussed in a whole application level, which is the most important view for application users. This paper presents empirical evaluation of a hardware module sharing technique which can shift a tradeoff point of area...
The security hash algorithm 512 (SHA-512), which is used to verify the integrity of a message, involves computation iterations on data. The huge computation delay generated in that iteration limits the entire throughput of the system, and makes it difficult to pipeline the computation. To shorten the computation time in an iteration of the main loop, we used the data forwarding method. Here we introduce...
Monte Carlo simulations and other scientific applications that depend on random numbers are increasingly implemented in parallel configurations in programmable hardware. High-quality pseudo-random number generators (PRNGs), such as the Mersenne Twister, are based on binary linear recurrence equations. They have extremely long periods (more than 21024 numbers generated before the entire sequence repeats)...
This paper describes 3-stage and 4-stage pipeline MD5 implementations on FPGA. This work removes the data dependency of a single step inside the main loop of the MD5 algorithm by data forwarding methodology, and breaks that single step computation into 3/4 pipeline stages. Three implementations on Xilinx Vertex-II are given with the throughput get to 1.04 Gbps, and occupy 1064 hardware slices. Thus,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.