Search results

chapter

Voltage margins identification on commercial x86-64 multicore microprocessors

George Papadimitriou, Manolis Kaliorakis, Athanasios Chatzidimitriou, Charalampos Magdalinos, more

2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design (IOLTS) > 51 - 56

2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design (IOLTS)

In this paper, we explore the pessimistic voltage guardbands of two multicore x86-64 microprocessor chips that belong to different microarchitectures (one ultra-low power and one high-performance microprocessor), when programs are executed on individual cores of the CPU chips. We also examine the energy and temperature gains as positive effects of lowering the voltage in both chips while preserving...

chapter

Salvaging chips with caches beyond repair

Hsunwei Hsuing, Byeongju Cha, Sandeep K. Gupta

2012 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1263 - 1268

2012 Design, Automation & Test in Europe Conference & Exhibition (DATE 2012)

Defect density and variabilities in values of parameters continue to grow with each new generation of nano-scale fabrication technology. In SRAMs, variabilities reduce yield and necessitate extensive interventions, such as the use of increasing numbers of spares to achieve acceptable yield. For most microprocessor chips, the number of SRAM bits is expected to grow 2× for every generation. Consequently,...

chapter

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

C Gregg, K Hazelwood

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 134 - 144

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000× over code running on multi-core CPUs. Other studies have claimed that well-tuned CPU code...

chapter

Universal rules guided design parameter selection for soft error resilient processors

Lide Duan, Ying Zhang, Bin Li, Lu Peng

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 247 - 256

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

High-performance processors suffer from soft error vulnerability due to the increasing on-chip transistor density, shrinking processor feature size, lower threshold voltage, etc. In this paper, we propose to use a rule search strategy, i.e. Patient Rule Induction Method (PRIM), to optimize processor soft error robustness. By exploring a huge microarchitectural design space on the Architectural Vulnerability...

chapter

Performance characterization of mobile-class nodes: Why fewer bits is better

M McDaniel, K Hazelwood

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 131 - 132

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

Mobile-class nodes, also known as netbooks, have become increasingly popular in the personal computing market. As is the trend in the computing market, the processors in these mobile-class nodes moving from 32 bits to 64 bits. This move extends the memory ceiling beyond the traditional 4GB, allowing for a significantly larger virtual address space. In addition, on the x8 6_64 architecture, 64-bit...

chapter

Characterizing multi-threaded applications based on shared-resource contention

T Dey, Wei Wang, J W Davidson, M L Soffa

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 76 - 86

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

For higher processing and computing power, chip multiprocessors (CMPs) have become the new mainstream architecture. This shift to CMPs has created many challenges for fully utilizing the power of multiple execution cores. One of these challenges is managing contention for shared resources. Most of the recent research address contention for shared resources by single-threaded applications. However,...

chapter

Link-time optimization for power efficiency in a tagless instruction cache

T M Jones, S Bartolini, J Maebe, D Chanet

International Symposium on Code Generation and Optimization (CGO 2011) > 32 - 41

2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011)

The instruction cache is a critical component in any microprocessor. It must have high performance to enable fetching of instructions on every cycle. However, current designs waste a large amount of energy on each access as tags and data banks from all cache ways are consulted in parallel to fetch the correct instructions as quickly as possible. Existing approaches to reduce this overhead remove unnecessary...

chapter

A Time-Predictable Object Cache

M Schoeberl

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing > 99 - 105

2011 IEEE 14th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC 2011)

Static cache analysis for data allocated on the heap is practically impossible for standard data caches. We propose a distinct object cache for heap allocated data. The cache is highly associative to track symbolic object addresses in the static analysis. Cache lines are organized to hold single objects and individual fields are loaded on a miss. This cache organization is statically analyzable and...

chapter

Thermal aware scheduling on an Intel desktop computer

Guanglei Liu, Gang Quan

2011 Proceedings of IEEE Southeastcon > 79 - 84

IEEE SoutheastCon 2011. Building Global Engineers

As a result the exponentially increased power consumption of IC chips, how to deal with the heat generated by processors has become a major concern in design of computing systems. Recently, we have seen extensive theoretical research results on dynamic thermal aware computing published in the literature. However, there are not many experimental researches reported based on practical computing platforms...

chapter

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

Li Hao, Liu Tao, Liu Guanghui, Xie Lunguo

2011 3rd International Conference on Computer Research and Development > 4 > 254 - 259

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

When there are several application running on Chip-Multiprocessors (CMPs), it is a problem to allocate the on-chip cache capacities between these competing applications. Cache partitioning is commonly used to solve this problem. Existing cache partitioning schemes either dedicate to the shared design or partition the last level cache depending on limited memory information. This paper presents Private...

chapter

Building efficient transactional memory support based on snoopy coherence

Zhengbin Pang, Shaogang Wang, Dan Wu, Jun Zhang, more

2011 3rd International Conference on Computer Research and Development > 1 > 46 - 50

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

Transactional memory (TM) is a new shared resource synchronization mechanism which was proposed to ease the difficulty of parallel programming. Currently, most hardware transactional memory systems leverages the extended directory based cache coherence protocol to resolve transaction conflicts; seldom research has been conducted to extend a snoopy coherence based chip multi-processor with TM support...

chapter

SLRF: A High-efficiency Shared Less Reused Filter in Chip Multiprocessors

Fuming Qiao, Baozhong Yu, Jianliang Ma, Tianzhou Chen, more

2011 Fourth International Conference on Intelligent Computation Technology and Automation > 2 > 1191 - 1197

2011 International Conference on Intelligent Computation Technology and Automation (ICICTA)

In general, the Less Recently Used (LRU) policy was commonly employed to manage shared L2 cache in Chip Multiprocessors. However, LRU policy remains some deficiencies based on previous studies. In particular, LRU may perform considerably bad when the workloads of application program are larger than L2 cache, because there are usually a great number of less reused lines that are never reused or reused...

chapter

Natural Feature Tracking on the OPERA Maestro platform

Timothy Gallagher, Saul H Weiss, Jessica Hahn

2011 Aerospace Conference > 1 - 7

2011 IEEE Aerospace Conference

This paper will present the results of porting the Extended Kalman Filter (EKF) Simultaneous Localization and Mapping (SLAM) Natural Feature Tracking (NFT) algorithm using the Automatically Tunable Linear Algebra System (ATLAS) for use in Tilera's Tile64 or OPERA's Radiation-Hardened By Design (RHBD) Maestro chip. ¹²This implementation of EKF SLAM was previously analyzed for performance on a RAD750...

chapter

Maximizing throughput of temperature-constrained multi-core systems with 3D-stacked cache memory

Kyungsu Kang, Jongpil Jung, Sungjoo Yoo, Chong-Min Kyung

2011 12th International Symposium on Quality Electronic Design > 1 - 6

2011 12th International Symposium on Quality Electronic Design (ISQED 2011)

Three-dimensional integration has the potential to increase integration density and to reduce communication latency of chip-multiprocessors (CMPs). However, high power density (i.e., power dissipation per unit volume) due to the high integration incurs temperature-related problems in reliability, power consumption, performance, and system cooling cost. In this paper, we propose a design-time solution...

chapter

Integrated circuit-architectural framework for PSN aware floorplanning in microprocessors

M Padmawar, S Roy, K Chakraborty

2011 12th International Symposium on Quality Electronic Design > 1 - 7

2011 12th International Symposium on Quality Electronic Design (ISQED 2011)

With continued scaling of transistor feature size, aggressive use of power saving techniques exacerbates the Power Supply Noise (PSN) problem in high performance microprocessors. PSN in an integrated circuit depends on the the interplay of the intrinsic circuit characteristics and the runtime execution of programs on the circuit. Consequently, accurate estimation of PSN in a microprocessor requires...

chapter

Architectural support predicting method for CMP scheduling

Gangyong Jia, Xi Li, Xuehai Zhou, Dong Dai

2011 3rd International Conference on Computer Research and Development > 4 > 10 - 14

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

On a CMP (Chip Multi-Processor) architecture, cache sharing impacts threads non-uniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as throughput decreasing, cache thrashing. This paper proposes an architectural support predicting method (ASPM) to predict inter-thread cache contention, and schedules threads based on...

chapter

MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy

S Srikantaiah, E Kultursay, Tao Zhang, M Kandemir, more

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 231 - 242

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

Given the diverse range of application characteristics that chip multiprocessors (CMPs) need to cater to, a “one-cache-topology-fits-all” design philosophy will clearly be inadequate. In this paper, we propose MorphCache, a Reconfigurable Adaptive Multi-level Cache hierarchy. Mor-phCache dynamically tunes a multi-level cache topology in a CMP to allow significantly different cache topologies to exist...

chapter

Shared last-level TLBs for chip multiprocessors

A Bhattacharjee, D Lustig, M Martonosi

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 62 - 63

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

Translation Lookaside Buffers (TLBs) are critical to processor performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as chip multiprocessors (CMPs) become ubiquitous, TLB design must be re-evaluated. This paper is the first to propose and evaluate shared last-level (SLL) TLBs as an alternative to the commercial norm of private, per-core L2...

chapter

Exploiting criticality to reduce bottlenecks in distributed uniprocessors

B Robatmili, S Govindan, D Burger, S W Keckler

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 431 - 442

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

Composable multicore systems merge multiple independent cores for running sequential single-threaded workloads. The performance scalability of these systems, however, is limited due to partitioning overheads. This paper addresses two of the key performance scalability limitations of composable multicore systems. We present a critical path analysis revealing that communication needed for cross-core...

chapter

ACCESS: Smart scheduling for asymmetric cache CMPs

Xiaowei Jiang, A Mishra, Li Zhao, R Iyer, more

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 527 - 538

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

In current Chip-multiprocessors (CMPs), a significant portion of the die is consumed by the last-level cache. Until recently, the balance of cache and core space has been primarily guided by the needs of single applications. However, as multiple applications or virtual machines (VMs) are consolidated on such a platform, researchers have observed that not all VMs or applications require significant...

INFONA - science communication portal

Search results

Voltage margins identification on commercial x86-64 multicore microprocessors

Salvaging chips with caches beyond repair

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Universal rules guided design parameter selection for soft error resilient processors

Performance characterization of mobile-class nodes: Why fewer bits is better

Characterizing multi-threaded applications based on shared-resource contention

Link-time optimization for power efficiency in a tagless instruction cache

A Time-Predictable Object Cache

Thermal aware scheduling on an Intel desktop computer

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

Building efficient transactional memory support based on snoopy coherence

SLRF: A High-efficiency Shared Less Reused Filter in Chip Multiprocessors

Natural Feature Tracking on the OPERA Maestro platform

Maximizing throughput of temperature-constrained multi-core systems with 3D-stacked cache memory

Integrated circuit-architectural framework for PSN aware floorplanning in microprocessors

Architectural support predicting method for CMP scheduling

MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy

Shared last-level TLBs for chip multiprocessors

Exploiting criticality to reduce bottlenecks in distributed uniprocessors

ACCESS: Smart scheduling for asymmetric cache CMPs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options