The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level...
In this paper, we introduce a 3 valued MVCM 4-phase link, where cores at each end of the link use 4-phase dual-rail protocol. The dual-rail N-bit data are encoded onto N + 1 wires on the link, thus reducing the number of interconnects between cores and improving power and crosstalk features. We show that it is impractical to encode a 2-phase dual-rail asynchronous data bit onto one wire using MVCM...
Quantum circuit design flow consists of two main tasks: synthesis and physical design. In the current flows, two procedures are performed subsequently; synthesis converts the design description into a technology-dependent netlist and then physical design takes the fixed netlist, produces layout, and schedules the netlist on the layout. This style of design suffers from limiting the optimization process...
In this paper the use of residue arithmetic is proposed as a technique to reduce delay variation in adders. It is found that the use of residue arithmetic offers significant delay variation reduction when compared to adders of the literature. Therefore this technique can be used to control variance of critical paths delay and efficiently meet timing constraints and thus improve timing yield. Experiments...
High-level synthesis is the process of balancing the distribution of RTL components throughout the execution of applications. However, a lot of balancing and optimization opportunities exist below RTL. In this paper, a coarse grain reconfigurable RTL component that combines a multiplier and a number of additions is presented and involved in high-level synthesis. The gate-level synthesis methodology...
In this paper we present constant multiplication architectures for the residue number system (RNS) moduli set {2n-1, 2n, 2n+1} using the signed-digit (SD) representation for recoding the constant operand. The resulting circuits require a small number of partial products, hence, their area and delay is also small.
This paper presents the design of an adiabatic/bootstrapped CMOS driver (xb-ad) using complementary pass-transistor logic (CPL) and a four-phase power clock. The proposed xb-ad uses a bootstrapped load driven circuit with PMOS and NMOS transistors driven by an NMOS evaluation logic block. When implemented on a 65 nm CMOS IV technology, under the large capacitive loading condition (16pF), xb-ad performs...
Dependability is becoming a key design aspect of today networked embedded systems (NES's) due to their increasing application to safety-critical tasks. Dependability evaluation must be based on modelling and simulation of faulty application behaviors, which must be related to faulty NES behaviors under actual defects. However, NES's behave differently from traditional embedded systems when testing...
Several regular parallel trees have been proposed over the years to optimize logic depth, area, fan-out and interconnect count for logic circuits. In this paper, we propose a comparative study of different parallel prefix trees used in the design of a new end-around carry (EAC) adder targeting FPGA technology. This new adder is based on the fast 128-bit binary floating-point EAC adder which has been...
Pulse-based data transmission has been demonstrated as a power-saving and high performance alternative to level-based signalling over global distances. Key to its correct operation is the use of reliable and low latency pulse generators. We propose a simple design of pulse generator, evaluate its performance and show a design that offers greater safeguards against malformed input signals. We show...
This paper presents the design of a highly area efficient bootstrapped CMOS level shifter (vj-level shifter). The proposed vj-level shifter uses a single bootstrap capacitor to minimise active area and to maintain the voltage difference between the gates of output pull-up PMOS and output pulldown NMOS transistors. When implemented on a 65 nm CMOS technology, under the large capacitive loading condition...
Currently, commercially available standard-cell libraries are often unstructured set of cells, suitable for several optimization criteria: speed, power, leakage and area consumption. Exploiting a large number of items makes the synthesis process and the library maintenance quite demanding. By smartly selecting a reduced set of cells, such efforts can be reduced, without critically affecting performance...
Nowadays, sub-45 nm designs are facing the challenges of parametric yield loss and reliability issues. Existing design practices increase the system's area/power penalty in order to cope with the growing number of design corners and their widening distributions. Our proposed solution is the Standardized Knobs and Monitors (SKM) framework, which enables monitoring and adjusting the circuits at run-time...
Todays NoCs are reaching a level where it is getting very hard to ensure 100% of functionality. Consequently, fault tolerance has become an important aspect in todays design techniques and like the system itself it has to be validated and tested. A vulnerable point of attack for faults in distributed systems like NoCs is certainly the interconnect. In this paper, we will give an overview about todays...
This paper presents a novel simultaneous multithreading (SMT) VLIW DSP architecture with dynamic dispatch mechanism to address the challenge of the underutilization of computing resources in the non-unit assumed latency (NUAL) VLIW DSPs. The SMT technology exploits the unused instruction slots by converting the thread-level parallelism to the instruction-level parallelism, improving the efficiency...
This paper presents the design of a highly efficient CMOS 2-input NAND (gcr-nand). When implemented on a 65 nm CMOS technology, under 1 pF capacitive loading condition, gcr-nand has a lower active area (3.4 times lower), and energy-delay product (56%) than the reference 2-input NAND (lscpl-nand). Furthermore, gcr-nand is able to operate under a high output load.
The implementation of an efficient result forwarding unit for asynchronous processors faces the problem of the inherent lack of synchronisation between result producer and consumer units. An efficient, full-custom solution to this problem has been proposed and implemented before (in the AMULET3 asynchronous processor) with the consequent limitations on design-space exploration and technology portability...
The need for small chip covered area in most handheld devices with out sacrifices in computational power introduces an interesting problem concerning expensive, computational intensive operations, like GF(2k) inversion which is widely used in cryptography. This paper addresses this problem by proposing a systolic inversion architecture for GF(2k) fields. This architecture is based on an extended analysis...
We investigate a form of logic decomposition that generates a 2SPP-P-circuit, which includes two blocks representing the projected subfunctions obtained by Shannon cofactoring with respect to a chosen variable, and a block representing the intersection of the projections. The three blocks are implemented as minimal 2-SPP forms (XOR-ANDOR with XOR restricted to two inputs). The minimization is performed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.