The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, three different approaches are considered for FPGA based implementations of the SHA-3 hash functions. While the performance of proposed unfolded and pipelined structures just match the state of the art, the dependencies of the structures which are folded slice-wise allow to further improve the efficiency of the existing state of the art. By solving the intra-round dependencies caused...
We present an approach for inserting latency-oblivious functionality into pre-existing FPGA circuits transparently. To ensure transparency — that such modifications do not affect the design's maximum clock frequency — we insert any additional logic post place-and-route, using only the spare resources that were not consumed by the pre-existing circuit. The typical challenge with adding new functionality...
NIST announced a public competition on Nov. 2, 2007 to develop a new cryptographic hash algorithm. Blake is one of the candidate among five finalist selected in round three of this competition. One of the major evaluation criteria of the candidate algorithm is efficient hardware implementation. In this paper compact area-efficient design of Blake-256 algorithm is implemented on FPGA. Horizontal Folding...
Double precision Floating Point (FP) arithmetic operations are widely used in many applications such as image and signal processing and scientific computing. Field Programmable Gate Arrays (FPGAs) are a popular platform for accelerating such applications due to their relative high performance, flexibility and low power consumption compared to general purpose processors and GPUs. Increasingly scientists...
FPGAs are increasingly being used to implement many new applications, including pipelined processor designs. Designers often employ memories to communicate and pass data between these pipeline stages. However, one-cycle communication between sender and receiver is often required. To implement this read-immediately-after-write functionality, bypass registers are needed by most FPGA memory blocks. Read...
Wave-pipelining enables a digital circuit to be operated at higher frequency. In the literature, only trial and error and manual procedures are adopted for the choice of the optimum value of clock and clock skew between the I/O registers of wave-pipelined circuits. The major contribution of this paper is the proposal for automating the above procedure for the ASIC implementation of wave-pipelined...
A new FPGA-based implementation scheme of the AES-128 (Advanced Encryption Standard, with 128-bit key) encryption algorithm is proposed in this paper. For maintaining the speed of encryption, the pipelining technology is applied and the mode of data transmission is modified in this design so that the chip size can be reduced. The 128-bit plaintext and the 128-bit initial key, as well as the 128-bit...
This paper presents a reliable processor pipeline architecture resilient to multiple soft- and timing errors. It also presents a probabilistic quantification of its performance overheads. This reliable processor pipeline architecture has been implemented in the Leon3 VHDL open source processor. An FPGA prototype running under random fault injection has also been developed. This reliable processor...
Today's FPGAs are capable of performing complex Image Processing schemes. In this paper we introduce a Configurable Zero Stall Image-Processing Pipelined Architecture. We define the handshake and discuss limitation resulting from configurability and complexity. We then present our solution for these issues allowing a simple yet effective circuit where no delay is introduced even though the output...
An high-performance implementation of 2-D lifting-based Discrete Wavelet Transform (DWT) in JPEG2000 applications is designed with low memory and high pipeline architecture. The architecture consists of a row processor module, a column processor module and two memory modules. we present two new row/column processor architecture and memory architecture, one of which includes 7 dual port rams. The For...
The H.264/AVC standard achieves much higher coding efficiency than previous video coding standards. Unfortunately mis comes with a cost in considerably increased complexity at the encoder mainly due to motion estimation. Therefore, various fast algorithms have been proposed for reducing computation but they do not consider how they can be effectively implemented by hardware. In this paper, we propose...
An equivalent optimized sub-pipelined architecture is proposed to implement the AES, every round including encryption and decryption needs one clock cycle. The SubBytes/InvSubBytes operation using composite field arithmetic in GF(24) and BlockRAMs respectively. In addition, an efficient key expansion which supports the output of 128 bits key per cycle and allows key changes every cycle is also presented...
Due to the continuously decreasing cost of FPGAs, they have become a valid implementation platform for SOCs. Typically, a soft core processor implementation is used to execute the software parts of the SOC. As each system is individually designed for a particular application, the idea is natural to support compute intensive parts of the code through customized hardware acceleration. Two different...
In this paper, we propose a high performance processor for elliptic curve cryptography (ECC) over GF(2163) by using polynomial presentation. It has three finite field (FF) RISC cores and a main controller to achieve instruction-level parallelism (ILP) with pipeline so that the largely parallelized algorithm for elliptic curve point multiplication can be well suited on this platform. Instructions for...
This paper presents the design and performance measurement of the hardware JPEG codec on an ARM926EJS emulation base board. JPEG is one of the best compression algorithms for still images. It preserves the quality with high compression ratio. JPEG codec encodes and decodes coloured as well grey image formats. The design exploits the pipeline architecture for high throughput. Overall size of the codec...
Digital architecture of fuzzy processor is proposed. All blocks - fuzzy sets (triangular), rule strength calculation (minimum) and defuzzyfication (weighted sum) were implemented in VHDL, verified and synthesized for FPGA. Implementation of floating point division block appeared to be the most difficult part of the design. Partially concurrent and pipelined data flow provides competitive performance,...
The existing direct digital frequency synthesis (DDFS) only can generate one kind of waveform with one read-only memory lookup table (ROM LUT), which cannot provide inphase wave and quadrature wave for global position system(GPS) carrier tracking. A DDFS is designed based on ROM LUT. Using the symmetry of the sine wave and the property that cosine waveform is leading or lagging one fourth period compared...
This paper explains a new design of a high speed MIPS (Microprocessor without Interlocked Pipelined Stages) based processor with significant improvements on instruction-level parallelism (ILP) and stall reduction to zero. These improvements are accomplished by utilizing four-stage pipelining, multiple-issue technique, and a Branch Target Buffer. The processor functionality has been verified on Altera...
In this paper a new FPGA design concept of a bilateral filter for image processing is presented. With the aid of this design the bilateral filter can be realized as a highly parallelized pipeline structure with very good utilization of dedicated resources. The innovation of the design concept lies in sorting the input data into groups in a manner that kernel based processing is possible. Another feature...
DCT/IDCT finds potent application in the field of image and signal processing. In this paper, we concentrate on a novel five stage pipelined implementation, which consumes less power. The design uses Verilog HDL and is simulated in Modelsim 6.3b. Matlab is used to generate the data in binary format which serves as the input data and cosine values for computing 1D DCT/IDCT in HDL. There are other low...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.