The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data dependence analysis (DDA) is essential for any automatic parallelizing compiler to determine parallelizability of given portions of programs. Several techniques and tests to analyze data dependence between array elements have already been proposed. It is clear that when one examines these conventional DDA techniques, there exists a trade-off between their analysis speed and exactness of their...
Response time in database systems is not getting small as a processor speed is accelerating because of a growing gap between speed of the processor and that of a memory, and increase in data size. A conventional memory controller and caches in a processor cannot provide enough bandwidth of data transfer between a processor and memory. For fast processing with large data, it is effective to equip a...
Chip multiprocessor (CMP) architecture has attracting much attention as a next-generation microprocessor architecture and many kinds of CMP are widely being researched. However, CMP architectures several difficulties for effective use of memory, especially cache or local memory near a processor core. The authors have proposed OSCAR CMP architecture, which cooperatively works with multigrain parallelizing...
This paper presents a new approach to dependence testing in the presence of nonlinear and non-closed array index expressions and pointer references. The chains of recurrences formalism and algebra is used to analyze the recurrence relations of induction variables, and for constructing recurrence forms of array index expressions and pointer references. We use these recurrence forms to determine if...
We describe GXP, a shell for distributed multi-cluster environments. With GXP, users can quickly submit a command to many nodes simultaneously (approximately 600 milliseconds on over 300 nodes spread across five local-area networks). It therefore brings an interactive and instantaneous response to many cluster/network operations, such as trouble diagnosis, parallel program invocation, installation...
This paper proposes a new, autonomous and dynamic optimization framework, called a meta-level computation. In this framework, a meta-level processor acquires the execution profile of a base-level processor, i.e. a conventional von Neumann machine, produces the optimized base-level configuration and performs the reconfiguration. We define the meta-level computation model based on the considerations...
The US Federal Government has convened a major committee to determine future directions for government sponsored high end computing system acquisitions and enabling research. The High End Computing Revitalization Task Force was inaugurated in 2003 involving all Federal agencies for which high end computing is critical to meeting mission goals. As part of the HECRTF agenda, a multi-day community wide...
Instruction queues consume a significant amount of power in high-performance processors, primarily due to instruction wakeup logic access to the queue structures. The wakeup logic delay is also a critical timing parameter. This paper proposes a new queue organization using a small number of successor pointers plus a small number of dynamically allocated full successor bit vectors for cases with a...
Microprocessor performance has improved at about 55% per year for the past three decades. To maintain this performance growth rate, next generation processors must achieve higher levels of instruction level parallelism. However, it is known that a conditional branch poses serious performance problems in modern processors. In addition, as an instruction pipeline becomes deep and the issue width becomes...
This paper presents how to make inexpensive personal supercomputers getting the merit of commercial-off-the-shelf (COTS) continuously after the death of vector super-computer venders. It is designed to realize this goal without any modification on CPU, bridge chips on motherboard and memory chips. Only plugging a new memory module with vector load/store function make an inexpensive home-use personal...
This paper proposes a fault-tolerant fully adaptive deadlock-recovery routing algorithm for k-ary n-cube networks. We intend to consider both the adaptability for faults and the communication performance by integrating regular and irregular network routing. Our algorithm tolerates any number or shape of faults without disabling fault-free nodes by maintaining routing tables that are configured based...
This work presents an efficient multi-banked architecture of the register file, and a low-power compiler support which reduces energy consumption in this device by more than a 78%. The key idea of this work is based on a quasi-deterministic interpretation of the register assignment task, and the use of the voltage scaling techniques
A three-dimensional fluid code, IMPACT-3D has been parallelized with high performance Fortran (HPF) on the Earth Simulator. IMPACT-3D is an implosion analysis code using TVD scheme, which performs three-dimensional compressible and inviscid Eulerian fluid computation with the explicit 5-point stencil scheme for spatial differentiation and the fractional time step for time integration. The third dimension...
Virtually all of the discussion on "commodity" vs. "custom" architectures, especially for highly parallel systems, has focused on the high-glamor, high complexity processor core. This paper takes a different tack - it explores the potential for directly attacking the memory wall by programming the classically "dumb" memory interface. Several related but separable techniques...
In an SMT processor, the increase of the register contexts of a thread requires a large number of physical registers. Moreover, a physical register file in an SMT processor requires more ports for the execution units, which cause significant growth of the area, access time and power consumption of the register file. These problems are critical hurdles to implement a large scale SMT processor. Especially,...
Currently, many people are enjoying multimedia applications with image and audio processing on PCs, PDAs, mobile phones and so on. With the popularization of the multimedia applications, needs for low cost, low power consumption and high performance processors has been increasing. To this end, chip multiprocessor architectures which allow us to attain scalable performance improvement by using multigrain...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.