The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
An SMT processor is designed to execute multiple threads simultaneously in order to gain higher performance with sharing resources such as ALUs and cache memory among several threads. However, sharing cache memory may cause thread conflict misses which degrades its performance. In this paper, an effective replacement strategy in which conflicts miss ratio among threads is controlled by limiting the...
The continuing downscaling of integrated circuits makes modern devices more susceptible to soft errors. This paper investigates the possibility of using Four-State Logic (FSL) to improve the fault tolerance of digital circuits. FSL is a possible implementation of asynchronous Quasi Delay Insensitive (QDI) logic using a more efficient encoding and handshake protocol. The behavior of asynchronous circuits...
In this paper, we propose an architecture, which we call GridRT, capable of dealing with the main features, such as shadows and reflections effects, of Ray Tracing used for rendering three-dimensional scenes. This architecture achieves an efficient overall performance yet using a simple and compact massively parallel design. The design exploits the usage of Xilinx?? Floating Point Operator IP Core...
This work presents an architecture to compute matrix inversions in a reconfigurable digital system, benefiting from embedded processing elements present in FPGAs, and using double precision floating point representation. The main module of this system is the processing component for the Gauss-Jordan elimination. This component consists of other smaller arithmetic units, organized in pipeline. These...
Modern handheld embedded systems operate under stringent power and real-time constraints. These systems run highly data-dominated applications from multimedia and wireless domains. Most of these applications spend significant amount of execution time in nested-loops. In order to reduce the loop control overhead several loop controller architectures have been proposed in the past. In this paper we...
The 2D Discrete Wavelet Transform (DWT) is a time-consuming kernel in many multimedia applications such as JPEG2000 and MPEG-4. The 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. The vertical filtering is easy to vectorize (assuming row-major order), but to vectorize the horizontal filtering many overhead instructions are required. In this...
This paper presents a novel simultaneous multithreading (SMT) VLIW DSP architecture with dynamic dispatch mechanism to address the challenge of the underutilization of computing resources in the non-unit assumed latency (NUAL) VLIW DSPs. The SMT technology exploits the unused instruction slots by converting the thread-level parallelism to the instruction-level parallelism, improving the efficiency...
A compound instruction, encoding several ALU or memory operations within an instruction word, has been regarded as an efficient way of improving performance. In the compiler for embedded processors, the code generation algorithm for compound instructions has been built by dealing mainly with instruction selection which is a crucial phase of code generation. In this paper, we propose an iterative code...
In this paper a low latency, on chip communication network (NoC) for a run-time reconfigurable (RTR) grid inside dynamically and partially reconfigurable (DPR) FPGAs is proposed, which supports the arbitrary placement of run-time reconfigurable modules (RTRM) inside the grid. The dedicated, fully meshed, silicon network should support the arrangement of communication channels between the RTRMs within...
The implementation of an efficient result forwarding unit for asynchronous processors faces the problem of the inherent lack of synchronisation between result producer and consumer units. An efficient, full-custom solution to this problem has been proposed and implemented before (in the AMULET3 asynchronous processor) with the consequent limitations on design-space exploration and technology portability...
Current superscalar processors use a reorder buffer (ROB) to support speculation, precise exceptions, and register reclamation. Instructions are retired from this structure in program order, which may lead to significant performance degradation if a long latency operation blocks the ROB head. In this paper, a checkpoint-free out-of-order commit architecture is proposed, which replaces the ROB with...
We present a general methodology to implement a processor energy model, based on instruction-level characterization, and we apply it to a SPARC-based Leon3 processor. The model is characterized by simulating back-annotated gate-level netlist and has two levels of accuracy: a coarse-grain estimation based on characterizing each single instruction and a fine-grain estimation accounting for the impact...
Configurable coprocessors have been an active area for some time. The limitation of word length of instruction set and the number of operands in a single instruction have become a potential performance bottleneck for traditional SIMD extension. In this paper, we use LEON-2 as the host platform and present a novel low-cost architecture with extended shadow_f registers. In each extended instruction,...
Constant evolution of norms and applications, usually implemented on system-on-chip (SOC), increases architecture performance and flexibility requirements. Current architectures are consequently becoming more complex and difficult to develop. One of the solutions is to develop design frameworks based on high-level architecture description languages (ADL). These ADLs are useful for a rapid description...
Since Frame Rate Up-Conversion (FRC) is started to be used in recent consumer electronics products like High Definition TV, real-time and low cost implementation of FRC algorithms has become very important. Therefore, in this paper, we propose a low cost hardware architecture for realtime implementation of frame interpolation algorithms. The proposed hardware architecture is reconfigurable and it...
This paper shows a method to reduce the number of input variables to represent incompletely specified index generation functions. A compound variable is generated by EXORing the original input variables. By using both original and compound variables, incompletely specified index generation functions can be represented by fewer variables. As a means to select variables, a heuristic method using information...
Hash functions are widely used in, and form an important part of many cryptographic protocols. Currently, a public competition is underway to find a new hash algorithm(s) for inclusion in the NIST Secure Hash Standard (SHA-3). Computational efficiency of the algorithms in hardware will form one of the evaluation criteria. In this paper, we focus on five of these candidate algorithms, namely CubeHash,...
In this paper, we present a fast low-power low-energy standard public-key cryptography processor for use in power/energy-limited applications. The proposed prime-field elliptic-curve cryptography hardware uses a modified Montgomery modular inverse algorithm to minimize the total calculation time and is completely flexible in terms of the field and curve parameters. The power consumption is minimized...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.