Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on

chapter

SPATL: Honey, I Shrunk the Coherence Directory

Hongzhou Zhao, Arrvindh Shriraman, Sandhya Dwarkadas, Vijayalakshmi Srinivasan

2011 International Conference on Parallel Architectures and Compilation Techniques > 33 - 44

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

One of the key scalability challenges of on-chip coherence in a multicore chip is the coherence directory, which provides information on sharing of cache blocks. Shadow tags that duplicate entire private cache tag arrays are widely used to minimize area overhead, but require an energy-intensive associative search to obtain the sharing information. Recent research proposed a Tagless directory, which...

chapter

POPS: Coherence Protocol Optimization for Both Private and Shared Data

Hemayet Hossain, Sandhya Dwarkadas, Michael C. Huang

2011 International Conference on Parallel Architectures and Compilation Techniques > 45 - 55

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

As the number of cores in a chip multiprocessor (CMP) increases, the need for larger on-chip caches also increases in order to avoid creating a bottleneck at the off-chip interconnect. Utilization of these CMPs include combinations of multithreading and multiprogramming, showing a range of sharing behavior, from frequent inter-thread communication to no communication. The goal of the CMP cache design...

chapter

An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence

Jun Lee, Jungwon Kim, Junghyun Kim, Sangmin Seo, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 56 - 67

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Recently, Intel has introduced a research prototype many core processor called the Single-chip Cloud Computer (SCC). The SCC is an experimental processor created by Intel Labs. It contains 48 cores in a single chip and each core has its own L1 and L2 caches without any hardware support for cache coherence. It allows maximum 64GB size of external memory that can be accessed by all cores and each core...

chapter

DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

Byn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 155 - 166

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

For parallelism to become tractable for mass programmers, shared-memory languages and environments must evolve to enforce disciplined practices that ban "wild shared-memory behaviors;'' e.g., unstructured parallelism, arbitrary data races, and ubiquitous non-determinism. This software evolution is a rare opportunity for hardware designers to rethink hardware from the ground up to exploit opportunities...

chapter

Scalable Proximity-Aware Cache Replication in Chip Multiprocessors

Chongmin Li, Haixia Wang, Yibo Xue, Dongsheng Wang, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 191 - 192

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

We propose Proximity-Aware cache Replication (PAR), an LLC replication technique that elegantly integrates an intelligent cache replication placement mechanism and a hierarchical directory-based coherence protocol into one cost-effective and scalable design. Simulation results on a 64-core CMP show that PAR can achieve 12\% speedup over the baseline shared cache design with SPLASH2 and PARSEC workloads...

chapter

Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory

Anurag Negi, Per Stenstrom, Ruben Titos-Gil, Manuel E. Acacio, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 203 - 204

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Lazy hardware transactional memory (HTM) al-lows better utilization of available concurrency in transactional workloads than eager HTM, but poses challenges at commit time due to the requirement of en-masse publication of speculative updates to global system state. Early conflictdetection can be employed in lazy HTM designs to allow non-conflicting transactions to commit in parallel. Though this has...

chapter

Sampling Temporal Touch Hint (STTH) Inclusive Cache Management Policy

Yingying Tian, Daniel A. Jimenez

2011 International Conference on Parallel Architectures and Compilation Techniques > 209

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Sampling Temporal Touch Hint (STTH) Inclusive Cache Management Policy

chapter

A Software-Managed Coherent Memory Architecture for Manycores

Jungho Park, Choonki Jang, Jaejin Lee

2011 International Conference on Parallel Architectures and Compilation Techniques > 213

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Cache coherent Non-Uniform Memory Access (cc-NUMA) architectures have been widely used for chip multiprocessors (CMPs). However, they require complicated hardware to properly handle the cache coherence problem. Moreover, it generates heavy on-chip network traffic due to the coherence enforcement. In this work, we propose a simple software-managed coherent memory architecture for many cores. Our memory...

chapter

An Architecture to Enable Lifetime Full Chip Testability in Chip Multiprocessors

Rance Rodrigues, Israel Koren, Sandip Kundu

2011 International Conference on Parallel Architectures and Compilation Techniques > 219

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Technology scaling has led to a tremendous increase in the packing density of transistors. However, these small transistors are susceptible to certain impediments that were not present earlier. Manufacturability suffers due to trailing lithography technology which does not scale well with transistor technology. Increased leakage current has reduced effectiveness of burn-in tests. Infant mortality...

chapter

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory

Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 340 - 349

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they are looked up on every memory access, are paramount to performance scalability. The emergence of chip-multiprocessors (CMPs) with per-core TLBs, has brought the problem of TLB coherence to front stage. TLBs are kept coherent at the software-level by the operating system...

INFONA - science communication portal

2011 International Conference on Parallel Architectures and Compilation Techniques

SPATL: Honey, I Shrunk the Coherence Directory

POPS: Coherence Protocol Optimization for Both Private and Shared Data

An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence

DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

Scalable Proximity-Aware Cache Replication in Chip Multiprocessors

Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory

Sampling Temporal Touch Hint (STTH) Inclusive Cache Management Policy

A Software-Managed Coherent Memory Architecture for Manycores

An Architecture to Enable Lifetime Full Chip Testability in Chip Multiprocessors

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory

Filter options

Publication date

Keywords

INFONA - science communication portal

2011 International Conference on Parallel Architectures and Compilation Techniques $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2011 International Conference on Parallel Architectures and Compilation Techniques