The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Systems based on many Chip Multi-Processor (CMP) modules interconnected by global networks constitute now a feasible solution, which brings back to life challenges of massively parallel systems. The paper presents new methods for data communication inside CMP modules and for inter-CMP-module data communication. Inside CMP modules data communication through shared variables is improved by the use of...
As technology scaling enables the integration of billions of transistors on a chip, economies of scale are prompting the move toward parallel chip architectures with application-specific systems-on-a-chip (SoC) leveraging multiple cores on a single chip for better performance at manageable design costs. The demand for communicating capability of many-core SoC will definitely increase because the traffic...
On a multi-socket architecture with load below peak, as is often the case in a server installation, it is common to consolidate load onto fewer sockets to save processor power. However, this can increase main memory power consumption due to the decreased total cache space. This paper describes inter-socket victim cacheing, a technique that enables such a system to do both load consolidation and cache...
Introducing POWER7™ the latest member of the IBM POWER™ processor family. A 567 mm2 chip implemented in 45nm SOI technology, holding eight quad threaded cores, a 32MB shared eDRAM L3, two memory controllers and high bandwidth SMP interfaces. The new out of order, shallow pipeline core with 12 execution units, multiport L1 caches and a private 256 kB L2 offers the efficiency to support 4× the number...
With the wide availability of chip multi-processing (CMP), software developers are now facing the task of effectively parallelizing their software code. Once they have identified the areas of parallelization, they will need to know the level of code granularity needed to ensure profitable execution. Furthermore, this problem multiplies itself with different hardware available. In this paper, we present...
On-chip SRAM caches have come to dominate the total chip area and leakage power consumed in state-of-the-art microprocessor designs. Such large memories are necessary to attain high performance, however it is critical to minimize the idle currents drawn while these SRAM banks are inactive. This work proposes a novel voltage reduction technique to reduce SRAM leakage power during the standby mode....
HyperTransport link is a high performance IO interface for system connection. In this paper, the architecture of a HyperTransport interface is introduced. This HyperTransport interface realizes efficient HT-AXI bidirectional transformation, where AXI is a popular bus protocol in SOC architectures. Furthermore, this HyperTransport interface provides dedicated hardware support for cache coherence protocol...
This paper examines the architecture, algorithm and implementation of a switch-based multi-processor realization of the fast Fourier transform (FFT). The architecture employs M processing elements (PEs), and provides a speedup of M compared with systems that use a single PE. An algorithm is provided to detect and resolve memory conflicts. A CMOS implementation of a four-PE processor is presented....
In this paper, we will study the on-chip network and memory hierarchy design of the Godson-T - a homogeneous many-core processor. Godson-T has 64 cores (with private L1 cache), and 16 global L2 cache banks. All these on-chip units are connected by a 2D 8 ?? 8 mesh network. Our study reveals that:(a) Global on-chip L2 cache can effectively alleviate the memory pressure caused by the data-thirsty on-chip...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.