The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In Multi-Processor System-on-Chip (MPSoC) architectures equipped with shared-memory, caches have significant impact on performance and energy consumption. Indeed, if the executed application depicts a high degree of reference locality, caches may reduce the amount of shared-memory accesses and data transfers on the interconnection network. Hence, execution time and energy consumption can be greatly...
An SMT processor is designed to execute multiple threads simultaneously in order to gain higher performance with sharing resources such as ALUs and cache memory among several threads. However, sharing cache memory may cause thread conflict misses which degrades its performance. In this paper, an effective replacement strategy in which conflicts miss ratio among threads is controlled by limiting the...
This paper presents an innovative way to build flexible benchmarks based on micro-architecture independent characteristics. The proposed approach enables the testing and stressing of processors in order to reflect the real nature of applications and give meaningful information to the designers. The use of a limited number of basic blocks hand-coded in assembly, wisely chosen and arranged, enables...
Several regular parallel trees have been proposed over the years to optimize logic depth, area, fan-out and interconnect count for logic circuits. In this paper, we propose a comparative study of different parallel prefix trees used in the design of a new end-around carry (EAC) adder targeting FPGA technology. This new adder is based on the fast 128-bit binary floating-point EAC adder which has been...
Duplication and comparison has proven to be an efficient method for error detection. Based on this generic principle dual core processor architectures with output comparison are being proposed for safety critical applications. Placing two instances of the same (arbitrary) processor on one die yields a very cost efficient "single chip" implementation of this principle. At the same time, however,...
Utilizing a heterogeneous multiprocessor system has become a popular design paradigm to build an embedded system at a cheap cost. A reliability issue, which is vulnerability to single event upsets (SEUs), has not been taken into account in a conventional IC (integrated circuit) design flow, while chip area, performance, and power consumption have been done. This paper presents a system design paradigm...
In this paper, we present a novel method for merging sets of computational patterns into a reconfigurable cell respecting design constraints and optimizing specific design aspects. Each cell can then be used in a run-time reconfigurable processor extension. Our method uses constraint programming to define the pattern merging problem and therefore can easily include design constraints and optimize...
The implementation of an efficient result forwarding unit for asynchronous processors faces the problem of the inherent lack of synchronisation between result producer and consumer units. An efficient, full-custom solution to this problem has been proposed and implemented before (in the AMULET3 asynchronous processor) with the consequent limitations on design-space exploration and technology portability...
Current superscalar processors use a reorder buffer (ROB) to support speculation, precise exceptions, and register reclamation. Instructions are retired from this structure in program order, which may lead to significant performance degradation if a long latency operation blocks the ROB head. In this paper, a checkpoint-free out-of-order commit architecture is proposed, which replaces the ROB with...
We present a general methodology to implement a processor energy model, based on instruction-level characterization, and we apply it to a SPARC-based Leon3 processor. The model is characterized by simulating back-annotated gate-level netlist and has two levels of accuracy: a coarse-grain estimation based on characterizing each single instruction and a fine-grain estimation accounting for the impact...
Main objective of this paper is to outline possible ways how to achieve a substantial acceleration in case of advection-diffusion equation (A-DE) calculation, which is commonly used for a description of the pollutant behavior in atmosphere. A-DE is a land of partial differential equation (PDE) and in general case it is usually solved by numerical integration due to its high complexity. These types...
The need for small chip covered area in most handheld devices with out sacrifices in computational power introduces an interesting problem concerning expensive, computational intensive operations, like GF(2k) inversion which is widely used in cryptography. This paper addresses this problem by proposing a systolic inversion architecture for GF(2k) fields. This architecture is based on an extended analysis...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.