The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we explore the pessimistic voltage guardbands of two multicore x86-64 microprocessor chips that belong to different microarchitectures (one ultra-low power and one high-performance microprocessor), when programs are executed on individual cores of the CPU chips. We also examine the energy and temperature gains as positive effects of lowering the voltage in both chips while preserving...
Defect density and variabilities in values of parameters continue to grow with each new generation of nano-scale fabrication technology. In SRAMs, variabilities reduce yield and necessitate extensive interventions, such as the use of increasing numbers of spares to achieve acceptable yield. For most microprocessor chips, the number of SRAM bits is expected to grow 2× for every generation. Consequently,...
General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000× over code running on multi-core CPUs. Other studies have claimed that well-tuned CPU code...
High-performance processors suffer from soft error vulnerability due to the increasing on-chip transistor density, shrinking processor feature size, lower threshold voltage, etc. In this paper, we propose to use a rule search strategy, i.e. Patient Rule Induction Method (PRIM), to optimize processor soft error robustness. By exploring a huge microarchitectural design space on the Architectural Vulnerability...
Mobile-class nodes, also known as netbooks, have become increasingly popular in the personal computing market. As is the trend in the computing market, the processors in these mobile-class nodes moving from 32 bits to 64 bits. This move extends the memory ceiling beyond the traditional 4GB, allowing for a significantly larger virtual address space. In addition, on the x8 6_64 architecture, 64-bit...
For higher processing and computing power, chip multiprocessors (CMPs) have become the new mainstream architecture. This shift to CMPs has created many challenges for fully utilizing the power of multiple execution cores. One of these challenges is managing contention for shared resources. Most of the recent research address contention for shared resources by single-threaded applications. However,...
The instruction cache is a critical component in any microprocessor. It must have high performance to enable fetching of instructions on every cycle. However, current designs waste a large amount of energy on each access as tags and data banks from all cache ways are consulted in parallel to fetch the correct instructions as quickly as possible. Existing approaches to reduce this overhead remove unnecessary...
Static cache analysis for data allocated on the heap is practically impossible for standard data caches. We propose a distinct object cache for heap allocated data. The cache is highly associative to track symbolic object addresses in the static analysis. Cache lines are organized to hold single objects and individual fields are loaded on a miss. This cache organization is statically analyzable and...
As a result the exponentially increased power consumption of IC chips, how to deal with the heat generated by processors has become a major concern in design of computing systems. Recently, we have seen extensive theoretical research results on dynamic thermal aware computing published in the literature. However, there are not many experimental researches reported based on practical computing platforms...
When there are several application running on Chip-Multiprocessors (CMPs), it is a problem to allocate the on-chip cache capacities between these competing applications. Cache partitioning is commonly used to solve this problem. Existing cache partitioning schemes either dedicate to the shared design or partition the last level cache depending on limited memory information. This paper presents Private...
Transactional memory (TM) is a new shared resource synchronization mechanism which was proposed to ease the difficulty of parallel programming. Currently, most hardware transactional memory systems leverages the extended directory based cache coherence protocol to resolve transaction conflicts; seldom research has been conducted to extend a snoopy coherence based chip multi-processor with TM support...
In general, the Less Recently Used (LRU) policy was commonly employed to manage shared L2 cache in Chip Multiprocessors. However, LRU policy remains some deficiencies based on previous studies. In particular, LRU may perform considerably bad when the workloads of application program are larger than L2 cache, because there are usually a great number of less reused lines that are never reused or reused...
This paper will present the results of porting the Extended Kalman Filter (EKF) Simultaneous Localization and Mapping (SLAM) Natural Feature Tracking (NFT) algorithm using the Automatically Tunable Linear Algebra System (ATLAS) for use in Tilera's Tile64 or OPERA's Radiation-Hardened By Design (RHBD) Maestro chip. 12This implementation of EKF SLAM was previously analyzed for performance on a RAD750...
Three-dimensional integration has the potential to increase integration density and to reduce communication latency of chip-multiprocessors (CMPs). However, high power density (i.e., power dissipation per unit volume) due to the high integration incurs temperature-related problems in reliability, power consumption, performance, and system cooling cost. In this paper, we propose a design-time solution...
With continued scaling of transistor feature size, aggressive use of power saving techniques exacerbates the Power Supply Noise (PSN) problem in high performance microprocessors. PSN in an integrated circuit depends on the the interplay of the intrinsic circuit characteristics and the runtime execution of programs on the circuit. Consequently, accurate estimation of PSN in a microprocessor requires...
On a CMP (Chip Multi-Processor) architecture, cache sharing impacts threads non-uniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as throughput decreasing, cache thrashing. This paper proposes an architectural support predicting method (ASPM) to predict inter-thread cache contention, and schedules threads based on...
Given the diverse range of application characteristics that chip multiprocessors (CMPs) need to cater to, a “one-cache-topology-fits-all” design philosophy will clearly be inadequate. In this paper, we propose MorphCache, a Reconfigurable Adaptive Multi-level Cache hierarchy. Mor-phCache dynamically tunes a multi-level cache topology in a CMP to allow significantly different cache topologies to exist...
Translation Lookaside Buffers (TLBs) are critical to processor performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as chip multiprocessors (CMPs) become ubiquitous, TLB design must be re-evaluated. This paper is the first to propose and evaluate shared last-level (SLL) TLBs as an alternative to the commercial norm of private, per-core L2...
Composable multicore systems merge multiple independent cores for running sequential single-threaded workloads. The performance scalability of these systems, however, is limited due to partitioning overheads. This paper addresses two of the key performance scalability limitations of composable multicore systems. We present a critical path analysis revealing that communication needed for cross-core...
In current Chip-multiprocessors (CMPs), a significant portion of the die is consumed by the last-level cache. Until recently, the balance of cache and core space has been primarily guided by the needs of single applications. However, as multiple applications or virtual machines (VMs) are consolidated on such a platform, researchers have observed that not all VMs or applications require significant...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.