The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Semiconductor process, device, and circuit simulation plays a significant role in the development of integrated circuit technology, providing quick prototyping and quantities not easily measured. Predictive models allow optimization of chip designs and are an essential component of the technology design process. As semiconductor devices shrink into the deep-submicron regime, device simulation faces...
Designers must pay careful attention to physical design details when designing high-speed circuits (/spl ges/1 GHz) because signal interactions become significant at high frequencies. Simulating these interactions is computationally demanding and requires the use of fast, efficient simulation algorithms and high-performance computers. Ensuring that these interactions do not degrade performance requires...
Describes an accurate and efficient simulator for a superscalar processor, the IBM RISC System/6000. This simulator was developed to obtain timing estimates for the execution of programs in an execution-driven simulation system. The simulator uses a new technique called in situ profiling, and a simplified runtime model of the RS/6000 processor to obtain dynamic timing estimates. The runtime model...
Trace driven simulation is a well known method for evaluating computer architecture options and is the technique of choice in most published cache and memory studies. Ideally, a trace should contain all the necessary events generated by a program. However, this is usually impractical for all but the most trivial of programs because of trace storage and simulation time costs. As computer systems increase...
The complexity of multi-level cache hierarchies simulation is much more complicated than that of single-level cache simulation, so that it is highly desirable to develop some efficient simulation methods for multi-level cache hierarchies performance analysis. A one pass multi-level unified cache hierarchies simulation method is developed based on stack model and inclusion properties. Recently, many...
Write-buffers have a significant impact on performance, especially in wide-issue superscalar systems with write-through caching. We develop fast efficient simulation methods for evaluating multiple write-buffer configurations together in a single-pass. Our results are also applicable for the simulation of other buffer structures. We first consider simulating non-coalescing write-buffers. We show that...
This paper presents a new fast way to simulate large networks of computers. The method uses a frontend EC, which accepts a parallel C program and translates it into a program in an intermediate language for parallel system simulations. An event driven simulator for distributed shared memory systems, DSIM, uses the intermediate language to simulate and obtain efficiency results in networks of thousands...
Fast computer simulation is an essential tool in the design of large parallel computers. We discuss the design and performance of our Fast Accurate Simulation Tool, FAST. We start by summarizing the tradeoffs made in the designs of this and other simulators. The key ideas used in this simulator involve execution driven simulation techniques that modify the object code of the application program being...
HASE is a hierarchical computer architectural design and simulation environment which allows a computer architecture and its implementation to be represented graphically at several different levels of abstraction. Class inheritance is used to factor out common behaviour, for example, producer/consumer synchronisations, queue and statistics functions inherited by the memory, bus and function unit hierarchies...
The development of accurate trace-driven simulation models has become a key activity in the design of new high-performance computer systems. Trace-driven simulation is fast, enabling analysis of the behaviour of large application and benchmark programs on a new computer system. We describe a trace-driven simulation engine for a decoupled processor architecture. We report on two ways of generating...
We have developed a special-purpose computer for computational fluid dynamics, DREAM-1A. DREAM-A has a peak speed of 80 Mflops and a memory size of 1.6 Gbyte. DREAM-1A consists of four units connected in a one-dimensional bidirectional ring network. One unit of DREAM-1A has one vector processing unit (VPU) and one hard-disk unit. The physical variables are stored in hard disk, instead of RAM. The...
The authors have designed and built HARP (Hermite AcceleratoR Pipeline)-1, a special-purpose computer for solving astronomical N-body problems with high accuracy using the Hermite integrator. The Hermite integrator uses analytically calculated time derivatives of the acceleration, in addition to the acceleration, to integrate orbits of particles. HARP-I has a 24-stage pipeline to perform the calculation...
Describes an application-specific LSI, the HARP (Hermite AcceleratoR Pipe) chip, which will he used in GRAPE-4, a massively-parallel special-purpose computer for astrophysical N-body simulations. The HARP chip calculates the gravitational interaction between particles. It consists of 15 floating point arithmetic units and one unit for function evaluation. The HARP chip performs about 20 floating point...
It is essential to extract fine grain parallelism for further increase of processor performance. This paper investigates an extension model of VLIW architecture called V++, which retains the capabilities of VLIW architecture to effectively exploit fine grain parallelism while introducing facilities for restructuring very long instruction words dynamically. V++ adopts two types of restructuring methods:...
The increasing disparity of speed between processor and its main memory makes ways for multi-level cache hierarchies in almost any of today's computer systems; specifically, the second-level (L2) caches with larger capacity but longer access time than the first-level (L1) caches have been adopted to reduce this memory gap. In this study an enhanced one-pass trace-driven simulation technique is used...
Compares the performance, in shared-memory multiprocessors, of locating translation-lookaside buffers (TLBs) at processors with that of locating TLBs at memory. The comparison is based on trace-driven simulations of multiprocessors with log N-stage networks interconnecting N processors and N memory modules. For the systems and workloads studied, memory-based TLBs perform noticeably better than processor-based...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.