The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Bufferless, deflection-routed, Butterfly Fat Trees (BFTs) can outperform state-of-the-art FPGAs overlay NoCs such as Hoplite by as much as 2–5× on throughput and ≈5× on worst-case latency at identical PE counts, and by ≈1.5× on throughput at identical resource costs >16K LUTs for statistical traffic patterns. In this paper, we show how to modify the tree connectivity and routing function to support...
This paper proposes a new synthesizable oscillator-based temperature sensor with minimal footprint for use in contemporary Xilinx FPGA devices. In contrast to previously published ring-oscillator architectures, based on inverters mapped onto single LUTs, the proposed oscillator uses an asynchronous Gray-coded 4-bit counter requiring only two 6-input LUTs. Due to its reduced hardware requirements,...
The latest published studies with extensive explorations of look-up table and cluster sizes are now more than a decade old. However, CMOS technology as well as CAD and transistor modeling tools have improved so much since that it is reasonable to wonder whether the conclusions of such studies still hold. One of the major difficulties of conducting these studies, especially in academia, is producing...
Deflection-routed FPGA overlay NoCs such as Hoplite suffer from high worst-case routing latencies due to the penalty of deflections at large system sizes. Segmentation of communication channels in such NoCs can (1) reduce worst-case packet routing latencies for FPGA traffic, (2) enable efficient composition of multi-application NoC workloads, and (3) ease the burden of supporting Partial Reconfiguration...
In this paper, a novel way to finely tune a net delay on Xilinx Field Programmable Gate arrays (FPGAs) is proposed. It consists of adding floating interconnects (nodes) to the net on which the delay is to be tuned, connected to any input pin of a switch matrix along the net. Adding nodes is made with a TCL script applied to an already placed and routed design. However, such nodes, also called antennas,...
Reconfigurable devices are widely attractive for several application fields thanks to their size, rapid prototyping characteristics, flexibility and upgradability. Thanks to partial Reconfiguration features, FPGA becomes the golden core of the adaptive computation paradigm since they may dynamically change their functionalities based on the elaboration request. Today, adaptive computation is mainly...
Significant increase of static power in nano-CMOS era and, subsequently, the end of Dennard scaling has put a Power Wall to further integration of CMOS technology in Field-Programmable Gate Arrays (FPGAs). An efficient solution to cope with this obstacle is power gating inactive fractions of a single die, resulting in Dark Silicon. Previous studies employing power gating on SRAM-based FPGAs have primarily...
String matching hardware engines generally utilize Ternary Content Addressable Memories (TCAMs). Although TCAM-based solutions are fast, they are expensive and power hungry. This paper proposes a high-performance memory-less architecture for string matching called Split-Bucket. It offers a performance comparable to TCAM-based solutions. Moreover, it is reconfigurable and scalable to the size of the...
Packing and placement are two crucial stages for FPGA realization. In the design flow, the basic logic units, such as look-up-tables (LUTs) and flip-flops (FFs), have to be merged into configurable logic blocks (CLBs) before placement. How the basic logic blocks are clustered in the packing stage has a great impact on the placement quality. This work presents an analytical placement framework for...
We can enhance the performance and efficiency of deflection-routed FPGA overlay NoCs by exploiting the cascading featureof the Xilinx UltraScale BlockRAMs. This allows us to (1) hardenthe multiplexers in the NoC switch crossbars, and (2) efficientlyadd buffering support to deflection-routing. While buffering isnot required for correct operation of a deflection routed NoC, it can boost network throughputs...
Static Random Access Memory (SRAM)-based routing multiplexers, whatever structure is employed, share a common limitation: their area, delay and power increase linearly with the input size. This property results in most SRAM-based FPGA architectures typically avoiding the use of large multiplexers. Resistive Random Access Memory (RRAM)-based multiplexers, built with one-level structure, have a unique...
Reducing worst case routing latencies while delivering high throughput and low energy are key design concerns in the engineering of overlay packet-switched NoCs for FPGA fabrics. Deflection routed torus NoCs are known to map particularly well to modern wire-rich FPGA substrates with fracturable LUT organizations while delivering high sustained bandwidths for various workloads and traffic patterns...
This paper presents a probabilistic methodology applied to FPGAs (Field Programmable Gate Arrays) logical structures known as basic building blocks, namely: CLBs (Complex Logic Blocks), usually consisting of LUTs (Look-up Tables) and biestables (flip-flops), PSWs (Programmable Switch Matrixes), IOBs (Input-Output Blocks), SMRAs (Static Random Access Memories) and CBs (Connection blocks). The scope...
This work analyzes the effect of the different design stages on the failure rate of circuits implemented in FPGAs. A bitstream-based SEU emulation platform is used to inject faults in order to analyze the critical bits of the circuit. Experiments are done on two different testbenchs, an FIR filter and a CORDIC chain. Tests consist on loading different variations of the designs in order to estimate...
Most FPGAs use Look-Up Table (LUT) as the basic logic block. Input sharing look-up table (ISLUT) architecture is a cluster architecture, which can be configured as one 6-input LUT, two smaller LUTs or other modes. In this paper, several ISLUT architectures are added into Verilog-to-Routing (VTR) tool to compare with standard 6-input basic logic element (BLE6) architecture. Experimental results show...
From the space and time dimension, the FPGA circuit is devised some levels with “computing unit + memory/register” via analyzing the characteristics of the FPGA circuit. Combined with the location importance, the connection degree among the nodes and their own soft error probability, an importance analysis model is proposed. And then the testing points are optimized based on the importance of each...
In-place Polarity inVersion (IPV) has been proposed to mitigate the single event upset (SEU) induced soft errors for academic VPR FPGA architectures, and this paper extends the original IPV so that it can be used for commercial FPGA architectures. Different from the original IPV, we use a new soft error model based on signal probability and propose a simple yet effective greedy based algorithm. To...
This paper presents an FPGA architecture capable of implementing relative timing based asynchronous designs. Modifications are made to a traditional synchronous FPGA architecture to make it asynchronous capable, while retaining its capability as a fully functional synchronous FPGA. Such a design permits multi-frequency implementations. A test FPGA fabric is developed and evaluated with the implementation...
This paper presents a novel reconfigurable circuit capable of implementing the entire family of 4-phase latch protocols. The architecture utilizes look-up-table based reconfigurable logic structures and fixed signal paths. The implemented circuit creates a fabric to realize a variety of high speed and low power controllers for asynchronous circuits on FPGAs. The circuit is implemented on the IBM Artisan...
Continuous shrinking of transistor size to provide high computation capability along with low power consumption has been accompanied by reliability degradations due to e.g., aging phenomenon. In this regard, with huge number of configuration bits, Field-Programmable Gate Arrays (FPGAs) are more susceptible to aging since aging not only degrades the performance, it may additionally result in corrupting...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.