The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both...
Coarse Grained Reconfigurable Arrays (CGRAs) are typically very efficient for a single task. However all functional units are required to perform in lock step, wasting resources and making complex programming flows difficult. Massively Parallel Processor Arrays (MPPAs) excel at executing unrelated tasks simultaneously, but limit the amount of resources dedicated to a single task. We propose an architecture...
Problems involving network design can be found in many real world applications such as power systems, vehicle routing, telecommunication networks, phylogenetic trees, among others. These problems involve thousands or millions of input variables and often need information and solution in real time. In general, they are computationally complex (NP-Hard). In this context, metaheuristics like evolutionary...
This paper presents the design and parametric testing of two FPGA-based, direction of arrival estimation algorithms (Bartlett and Minimum Variance Distortionless Response) for use in an adaptive array antenna system. The algorithms were implemented on a Xilinx Virtex-5 FPGA and tested using a test bed that emulates signals coming from an 8-channel, circular antenna head after being down converted...
Breadth-first Search (BFS) is a fundamental graph problem. Due to the irregular nature of memory accesses to graph data structures, parallelization of BFS on cache-based systems leads to poor performance. Many issues, such as memory access latency, cache coherence policy, and inter-process synchronization, affect the throughput performance of BFS on such systems. In our proposed message-passing multi-softcore...
The Smith-Waterman (SW) algorithm is the only optimal local sequence alignment algorithm. There are many SW implementations on FPGA, which show speedups of up to 100x as compared to a general-purpose-processor (GPP). In this paper, we propose a design of the SW traceback, which is done in parallel with the matrix fill stage and which gives the optimal alignment after once scanning through the whole...
So far we have proposed the systolic computational-memory (SCM) architecture for high-performance and scalable computation based on the finite difference methods. Although the SCM architecture has a completely parallel array structure, a lot of semiconductor devices are required to build a larger SCM array in the real world, which prefers a globally asynchronous and locally synchronous (GALS) design...
Design and verification of a novel array multiplier-accumulator architecture, named ABACUS, is introduced in this paper. The design priority in this architecture is low energy operation instead of the traditional `performance-first' approach. ABACUS uses a threshold function to implement multiple fast carry operations in parallel through a cellular array, and therefore significantly deviates from...
To resolve the latency problem of implementing Montgomery modular multiplication algorithm using the linear systolic array, this paper proposes the improved Montgomery algorithm, and improves the systolic array by combining the long carry save adder (CSA) structure. This paper also proposes a series of methods to optimize the critical path and a non-waiting modular multiplication strategy which can...
A general-purpose multi-channel radar echo simulator is researched and proposed in this paper to satisfy the various needs of the radar signal processor testing. The specificity of target echo model is the main problem to achieve the generalization for a simulator. The simulator will give priority to the issue of the generalization because the generalization is always the most attractive feature for...
This paper introduces a process to select different test circuit using FPGA controlled pull and disconnection of relay matrix, so that multi-port equipment testing could be achieved. This design, to some extent, has a simplified circuit complexity and an increased, comparing to the test equipment previously used.
This paper presents the design and simulation of a Configurable Logic Block (CLB) using the Quantum-Dot Cellular Automata (QCA) technology. The modeling, implementation, and successful simulation of a CLB slice for a nano quantum FPGA are discussed. We have drawn comparisons with various FPGA architectures at the quantum level and optimized the proposed architecture with respect to area and latency...
We present the systematic design of two linear array IP cores for the k-nearest neighbor (k-NN) benchmark classifier. The need for real-time classification of data vectors with possibly thousands of features (dimensions) motivates the implementation of this widely used algorithm in hardware in order to achieve very high performance by exploiting block pipelining and parallel processing. The two linear...
Computing systems typically suffer from delay in data processing. This delay is caused by computational power, architecture of the processor unit, synchronization signals, and so on. To enhance the performance of these systems by increasing the processing power, a new architecture and clocking technique is carried out in this paper. This new architecture design called Embedded Parallel Systolic Filters...
This paper describes the techniques used to describe and synthesize FPGA circuits expressed in a data-parallel domain specific language (DSL) called Accelerator. We identify the subset of data-parallel descriptions that are supported by our system and explain how we track memory access patterns which allow us to generate efficient FPGA circuits.
Real time systems typically suffer from delay in data processing. This delay is caused by many reasons such as computational power, processor unit architecture, and synchronization signals in these systems. In order to increase the processing power, a new architecture and clocking technique is carried out in this paper hence the performance. This new architecture design called Embedded Parallel Systolic...
Starting from sequential programs, we present an approach combining data reuse, multi-level MapReduce, and pipelining to automatically find the most power-efficient designs that meet speed and area constraints in the design space on Field-Programmable Gate Arrays (FPGAs). This combined approach enables trade-offs in power, speed and area: we show 63% reduction in power can be achieved with 27% increase...
The paper introduces novel field programmable gate array (FPGA) circuits based on hybrid CMOS/resistive switching device (memristor) technology and explores several logic architectures. The novel FPGA structure is based on the combination of CMOL (Cmos + MOLecular scale devices) FPGA circuits and recent improvements and generalization of the CMOL concept to allow multilayer crossbar integration, compatibility...
Reconfigurable mixed grain architectures have been demonstrated to be efficient and flexible for data parallel and computation-intensive applications. In this paper we present the design of a new Reconfigurable Cell (RC) based on a mixed-grain architecture. The architecture delivers a gate-level implementation of the Reconfigurable Logic Unit (RLU) focusing on the ALU implementation. The investigation...
Aim of this paper is to compare and prove implementation of normal multiplication and Vedic multiplication (using Urdhva Tiryakbhyam Sutra) on digital hardware requires same number of multiplication and addition operations.It makes difference only for mental calculations. Few VHDL codes has been developed for this. All multipliers has been tested for 16 ?? 16 multiplications for comparison. Test vectors...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.