The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Limited power budgets will be one of the biggest challenges for deploying future exascale supercomputers. One of the promising ways to deal with this challenge is hardware overprovisioning, that is, installingmore hardware resources than can be fully powered under a given power limit coupled with software mechanisms to steer the limited power to where it is needed most. Prior research has demonstrated...
Limited power budget is becoming one of the most crucial challenges in developing supercomputer systems. Hardware overprovisioning which installs a larger number of nodes beyond the limitations of the power constraint is an attractive way to design next generation supercomputers. In air cooled HPC centers, about half of the total power is consumed by cooling facilities. Reducing cooling power and...
Modern on-chip networks (NoCs) rely on virtual channel (VC) flow control to allow effective utilization of link bandwidth at the cost of more power and longer per-hop latency. Despite many existing optimization techniques for NoCs under VC flow control, we take a further step on questioning its necessity. Our finding is, when the network is not busy, circuit-switching (CS) may already satisfy the...
As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped...
On-chip interconnection (or NoC) is a major performance and power contributor to modern and future multicore processors. So far, many optimization techniques have been developed to improve its bandwidth, latency and power consumption. But it is not clear how energy efficiency is affected since an optimization technique normally comes with overheads. This paper thus attempts to address when and how...
This paper proposes a new hardware barrier mechanism which offers the flexibility to select which cores should join the synchronization, allowing for executing multiple multi-threaded applications by dividing a many-core processor into several groups. Experimental results based on an RTL simulation show that our hardware barrier achieves a 66-fold reduction in latency over typical software based implementations,...
A key challenge in next-generation supercomputing is to effectively schedule limited power resources. Modern processors suffer from increasingly large power variations due to the chip manufacturing process. These variations lead to power inhomogeneity in current systems and manifest into performance inhomogeneity in power constrained environments, drastically limiting supercomputing performance. We...
This paper describes a new approach to reduce the ground bounce (GB) while keeping the wakeup time short for fine-grain power gating. We propose a novel algorithm to synthesize an optimal unbalanced buffer tree (UBT) that turns on parallel power switches with slight time differences. We have applied our algorithm to function units of a 32-bit microprocessor. Experimental results have revealed that...
Power-performance efficiency is still remaining a primary concern for microprocessor designers. One of the sources of power inefficiency for recent LSI chips is increasing leakage power consumption. Power-gating is a well known technique to reduce leakage power consumption by switching off the power supply to idle logic blocks. Recently, fine-grained power-gating is emerged as a technique to minimize...
This paper presents a design and control scheme of a microprocessor whose internal function units are power gated at instruction-by-instruction basis. Enabling/disabling the power gating is adaptively controlled under the support of on-chip leakage monitors and the operating system to minimize energy overhead due to sleep-in and wakeup. Measured results of the fabricated chip in the 65nm CMOS technology...
Cube-1 is a heterogeneous multi-core processor which can achieve the required performance with the least energy consumption as possible. It can control the performance and energy with two levels: (1) the number of accelerators can be easily changed by increasing or decreasing the number of stacked chips after fabrication, as they are connected with inductive coupling links. (2) The supply voltage...
Recent battery driven IT devices including smart phone and tablets require versatile functions and high performance with low energy consumption. On the other hand, the initial cost of LSI for design and mask development has increased rapidly, and development of an SoC (System-on-a Chip) for each product has become difficult. Although flexible reconfigurable architectures can be a solution, the performance...
A scalable heterogeneous multi-core processor is developed. 3D heterogeneous chip stacking of a general-purpose CPU and reconfigurable multi-core accelerators improves computational energy efficiency by proper task assignment and massive parallel computing. The stacked chips interconnect through a scalable 3D Network on Chip (NoC). By simply changing the number of stacked accelerator chips, processor...
The authors developed a scalable heterogeneous multicore processor. 3D heterogeneous chip stacking of a general-purpose CPU and reconfigurable multicore accelerators enables various trade-offs between performance and energy consumption. The stacked chips interconnect through a scalable 3D network on a chip (NoC). By simply changing the number of stacked accelerator chips, processor parallelism can...
Cube-2 is a prototype of building block scalable reconfigurable accelerator using an inductive coupling interconnect. It is consisting of a ultra low leakage embedded processor Geyser and coarse-grained reconfigurable accelerators CMA (Cool Mega Array). A Geyser chip and multiple CMA chips are stacked, and a powerful network is formed by using the inductive coupling interconnect. The performance can...
CMA-Cube is the second prototype of building block scalable reconfigurable accelerator using inductive coupling interconnect. It uses the wireless inductive coupling interconnect as a packet switching network which connects accelerators. As an accelerator core, CMA (Cool Mega Array), which consists of a large coarse-grained PE array with combinatorial circuits and tiny micro-controller, is applied...
SLD(Silent Large Datapath)-1 is a prototype accelerator for media processing consisting of a large Processing Element (PE) array which includes 24bit 8 × 8 PEs with combinatorial circuits and a small micro-controller for data memory access. It was fabricated in 2.1mm × 4.2mm 65 nm CMOS, and achieves 1.356GOPS/11mW sustained performance by reducing overhead of clock tree and the benefit of voltage...
Cool Mega-Array (CMA) is an energy-efficient reconfigurable accelerator for battery-driven mobile devices. It has a large processing-element array without memory elements for mapping an application's data-flow graph, a simple programmable microcontroller for data management, and data memory. Unlike coarse-grained dynamically reconfigurable processors, CMA reduces power consumption by switching hardware...
This paper describes adaptive fine-grain control to power gate function units based on temperature dependent break-even time (BET). An analytical model to express the temperature dependent BET is introduced and the accuracy of the model was examined. Results demonstrated that the model well represents the exponential decrease in BET with the temperature. Meanwhile, it was found that the accuracy gets...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.