The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A fundamental objective in the design of a network-on-chip is to minimize its area and power consumption while keeping the performance requirements at acceptable levels. The trade-offs involved in the process depend on the target technology, ASIC or FPGA. This paper presents a novel design approach to customize the routers in a network-on-chip for reconfigurable systems. More specifically, given a...
Sparse matrix by vector multiplication (SMV) is a key operation of many scientific and engineering applications. Field programmable gate arrays (FPGAs) have the potential to significantly improve the performance of computationally intensive applications which are dominated by SMV. A shortcoming of most existing FPGA SMV implementations is that they use on-chip Block RAM or external SRAM to store the...
In this paper we investigate several common bus architectures and measure effective bandwidth between High Performance Computing cores and off-chip memory. Contributions of this paper include (i) characterizing the behavior of four common organizations using off-the-shelf IP cores, (ii) an investigation of the effect of multiple computational cores sharing the bus structures, and (iii) the development...
This paper presents a high-speed implementation of a 2-D fixed-point discrete wavelet transform (DWT) using the embedded DSP48 blocks available on a Xilinx Virtex-4 XC4VLX15-10 FPGA. The full transform uses just 10 DSP48 blocks, 3 block RAMs and 2,126 logic elements when synthesized using Xilinx ISE Version 8.2i and can perform calculations at 197.2 MHz. The results clearly show that by using the...
This paper presents an architecture for the computation of the atan(Y/X) operation suitable for broadband communication applications where a throughput of 20 MHz is required. The architecture is based on LUT methods and achieves lower power consumption with respect to an atan(Y/X) operator based on CORDIC algorithm with a lower latency. The proposed architecture can compute the atan(Y/X) with a latency...
In this work, we present a new structure for multiplication in finite fields. This structure is based on a digit-level LFSR (Linear Feedback Shift Register) multiplier, in which the area of the digit-multipliers is reduced using the Karatsuba method. We compare our results with the other works of the literature for F397. Furthermore, we propose new formulas for multiplication in F36 97. These new...
A high performance RLS lattice filter with evaluation of an unknown order of identified system was implemented as an accelerator PCORE for Xilinx EDK. The accelerator hardware can fully exploit parallelisms in the algorithm and remove load from a microprocessor. The EDK integration allows effective programing and debugging of a hardware accelerated DSP applications. The optimal logarithmic number...
Modern FPGAs have become so affordable that they can be used to substitute ASICs in mass produced devices. A key component of such configurable system on a chip (CSoC) is the processor core. Available and usable cores are either 32 or 8 bit wide. Thus, there is a gap between these two extremes, which we want to fill with our SoC kit. In this contribution we elaborate on our SoC kit and its components...
The programmable clock networks in FPGAs have a significant impact on overall power, area, and delay. Not only does the clock network itself dissipate a significant amount of power, since it connects to every latch on the FPGA and toggles every cycle, but the design of the clock network also affects how efficiently the rest of the application can be implemented since it imposes constraints on the...
Modern platform FPGAs integrate fine-grained reconfigurable logic with processor cores and allow the creation of complete configurable systems-on-chip. However, design methodologies have not kept up with the rise in complexity of the target hardware. In particular, there is little overlap between the programming model for embedded software running on a real-time operating system and the programming...
This paper presents a technique to fix timing violations caused by process variations in FPGAs by adjusting the clock skews of flip-flops. This involves making the clock distribution network tunable by adding programmable delay elements to compensate for variations. We propose generic as well as chip-specific skew assignment schemes that are robust to variations. The two proposed schemes result in...
Simple algorithms can be analytically characterized, but such analysis is questionable or even impossible for more complicated algorithms, such as Model Predictive Control (MPC). Instead, Monte Carlo Arithmetic (MCA) enables statistical experimentation with an algorithm during runtime for detection and mitigation of numerical anomalies. Previous studies of MCA have been limited to software floating...
Structured ASICs have emerged as a mid-way between cell-based ASICs with high NRE costs and FPGAs with high unit costs. Though the structured ASIC fabric attacks mask and other fixed cost it does not solve verification, particularly physical verification issues with ASICs or logic errors missed by simulation which would require re-spins. These can be avoided by testing in-system with an FPGA and migrating...
We are developing a set of reusable design blocks and several prototype systems for emulation of multi-core architectures in FPGAs. RAMP Blue is the first of these prototypes and was designed to emulate a distributed-memory message-passing architecture. The system consists of 768-1008 MicroBlaze cores in 64-84 Virtex-II Pro 70 FPGAs on 16-21 BEE2 boards, surpassing the milestone of 1000 cores in a...
In this paper, we investigate three different realizations of the same block from different points of view. The mentioned different realizations include two realizations with embedded processors (custom 16-bit RISC processor and general soft-core processor) and the third realization uses Handel-C as an example of synthesisable high-level abstraction languages. The results show that development time...
This paper discusses the mapping of arrays in a high-level SystemC description to hardware. Normally, arrays are implemented as register files using general purpose logic. Modern FPGAs however contain a large number of RAM blocks which can used to implement arrays instead. Memories have a limited number of ports and mapping arrays to multiport memories involves assigning each array access to a port...
The computer industry is at a cross-roads. The problems associated with scaling uniprocessor performance has forced all major computer manufactures to turn to multi-and many-core architectures. This sea change in processor design has created many opportunities for field programmable logic. In the RAMP project, we are developing an affordable and versatile multiprocessor emulation platform being built...
Summary form only given. Today's FPGA applications are made up of many different functional elements; hardware blocks, software modules, I/O functions and on-chip interconnect fabrics are four major categories of these elements. I will explore some of the characteristics of these categories in order to provide insight into how the creation, or synthesis, of these functional elements can be automated...
Summary form only given. Over the past twenty years, FPGAs evolved from simple glue-logic chips to complex systems-on-a-chip. This change can be viewed having distinct phases, each with different architecture, tools and methodology requirements. Are we now facing another phase change? Will FPGAs continue to evolve incrementally or are we about to see a radical change in field programmable logic? This...
Reconfigurable computing entails the utilization of a general-purpose processor augmented with a reconfigurable hardware structure (usually an FPGA). Normally, a complete reconfiguration is needed to change the functionality of the FPGA even when the change is minor. Moreover, the complete chip needs to be halted to perform the reconfiguration. Dynamic partial reconfiguration (DPR) provides the possibility...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.