Search results

chapter

Characterizing multi-threaded applications based on shared-resource contention

T Dey, Wei Wang, J W Davidson, M L Soffa

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 76 - 86

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

For higher processing and computing power, chip multiprocessors (CMPs) have become the new mainstream architecture. This shift to CMPs has created many challenges for fully utilizing the power of multiple execution cores. One of these challenges is managing contention for shared resources. Most of the recent research address contention for shared resources by single-threaded applications. However,...

chapter

Link-time optimization for power efficiency in a tagless instruction cache

T M Jones, S Bartolini, J Maebe, D Chanet

International Symposium on Code Generation and Optimization (CGO 2011) > 32 - 41

2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011)

The instruction cache is a critical component in any microprocessor. It must have high performance to enable fetching of instructions on every cycle. However, current designs waste a large amount of energy on each access as tags and data banks from all cache ways are consulted in parallel to fetch the correct instructions as quickly as possible. Existing approaches to reduce this overhead remove unnecessary...

chapter

Technology scaling: A system perspective

Proceedings of 2011 International Symposium on VLSI Design, Automation and Test > 1 - 16

2011 International Symposium on VLSI Design, Automation and Test (VLSI-DAT 2011)

A collection of slides from the author's conference presentation is given. The following topics are discussed: VLSI design, automation & test; technology scaling; Moore's law; platform segment characteristics; dynamic platform control; dynamic adaptation & reconfiguration; resilient platforms; voltage-frequency range limiters; voltage-frequency margins; fine-grain power management; voltage...

chapter

Custom FPGA-based micro-architecture for streaming computing

J C Alves, P C Diniz

2011 VII Southern Conference on Programmable Logic (SPL) > 51 - 56

2011 VII Southern Conference on Programmable Logic (SPL)

This paper describes a micro-architecture for a custom programmable FPGA-based processor, with direct support for streaming and vector computations relying on custom cache memory storage. The processor combines a custom data-path with several parallel data ports for accessing operands in streaming mode thus efficiently supporting nested looping constructs found in high-level languages while mitigating...

chapter

Embarrassingly scalable database systems

A Ailamaki

2011 IEEE 27th International Conference on Data Engineering > 1

2011 27th IEEE International Conference on Data Engineering (ICDE 2011)

Summary form only given. Database systems have long optimized for parallel execution; the research community has pursued parallel database machines since the early '80s, and several key ideas from that era underlie the design and success of commercial database engines today. Computer microarchitecture, however, has shifted drastically during the intervening decades. Until the end of the 20th century...

chapter

Communication on the Fly for Hierarchical Systems of Chip Multi-processors

M Tudruj, L Masko

2011 Sixth International Symposium on Parallel Computing in Electrical Engineering > 19 - 24

2011 6th International Symposium on Parallel Computing in Electrical Engineering (PARELEC 2011)

Systems based on many Chip Multi-Processor (CMP) modules interconnected by global networks constitute now a feasible solution, which brings back to life challenges of massively parallel systems. The paper presents new methods for data communication inside CMP modules and for inter-CMP-module data communication. Inside CMP modules data communication through shared variables is improved by the use of...

chapter

A Software-Pipelined Approach to Multicore Execution of Timing Predictable Multi-threaded Hard Real-Time Tasks

M Paolieri, E Quiñones, F J Cazorla, J Wolf, more

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing > 233 - 240

2011 IEEE 14th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC 2011)

Multicore processors can deliver higher performance than single-core processors by exploiting thread level parallelism (TLP): applications are split into independent threads, each of which is mapped into a different core, reducing the execution time and potentially its worst-case execution time (WCET). Unfortunately, inter-thread interferences generated by simultaneous accesses to shared resources...

chapter

A Time-Predictable Object Cache

M Schoeberl

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing > 99 - 105

2011 IEEE 14th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC 2011)

Static cache analysis for data allocated on the heap is practically impossible for standard data caches. We propose a distinct object cache for heap allocated data. The cache is highly associative to track symbolic object addresses in the static analysis. Cache lines are organized to hold single objects and individual fields are loaded on a miss. This cache organization is statically analyzable and...

chapter

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

Li Hao, Liu Tao, Liu Guanghui, Xie Lunguo

2011 3rd International Conference on Computer Research and Development > 4 > 254 - 259

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

When there are several application running on Chip-Multiprocessors (CMPs), it is a problem to allocate the on-chip cache capacities between these competing applications. Cache partitioning is commonly used to solve this problem. Existing cache partitioning schemes either dedicate to the shared design or partition the last level cache depending on limited memory information. This paper presents Private...

chapter

Building efficient transactional memory support based on snoopy coherence

Zhengbin Pang, Shaogang Wang, Dan Wu, Jun Zhang, more

2011 3rd International Conference on Computer Research and Development > 1 > 46 - 50

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

Transactional memory (TM) is a new shared resource synchronization mechanism which was proposed to ease the difficulty of parallel programming. Currently, most hardware transactional memory systems leverages the extended directory based cache coherence protocol to resolve transaction conflicts; seldom research has been conducted to extend a snoopy coherence based chip multi-processor with TM support...

chapter

Maximizing throughput of temperature-constrained multi-core systems with 3D-stacked cache memory

Kyungsu Kang, Jongpil Jung, Sungjoo Yoo, Chong-Min Kyung

2011 12th International Symposium on Quality Electronic Design > 1 - 6

2011 12th International Symposium on Quality Electronic Design (ISQED 2011)

Three-dimensional integration has the potential to increase integration density and to reduce communication latency of chip-multiprocessors (CMPs). However, high power density (i.e., power dissipation per unit volume) due to the high integration incurs temperature-related problems in reliability, power consumption, performance, and system cooling cost. In this paper, we propose a design-time solution...

chapter

Architectural support predicting method for CMP scheduling

Gangyong Jia, Xi Li, Xuehai Zhou, Dong Dai

2011 3rd International Conference on Computer Research and Development > 4 > 10 - 14

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

On a CMP (Chip Multi-Processor) architecture, cache sharing impacts threads non-uniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as throughput decreasing, cache thrashing. This paper proposes an architectural support predicting method (ASPM) to predict inter-thread cache contention, and schedules threads based on...

chapter

Automatic Feedback Control of Shared Hybrid Caches in 3D Chip Multiprocessors

A Sharifi, M Kandemir

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 393 - 400

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

3D integration enables building caches from different types of technologies such as SRAM, Magnetic RAM (MRAM), DRAM, and Phase-change RAM (PRAM). Hybrid cache architectures (HCAs) have been proposed to take advantage of the benefits offered by these types of technologies. Employing this novel cache architecture to build shared caches in chip multiprocessors (CMPs) can lead to significant performance...

chapter

MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy

S Srikantaiah, E Kultursay, Tao Zhang, M Kandemir, more

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 231 - 242

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

Given the diverse range of application characteristics that chip multiprocessors (CMPs) need to cater to, a “one-cache-topology-fits-all” design philosophy will clearly be inadequate. In this paper, we propose MorphCache, a Reconfigurable Adaptive Multi-level Cache hierarchy. Mor-phCache dynamically tunes a multi-level cache topology in a CMP to allow significantly different cache topologies to exist...

chapter

CHIPPER: A low-complexity bufferless deflection router

C Fallin, C Craik, O Mutlu

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 144 - 155

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the interconnect becomes a significant factor in cost, energy consumption and performance. Recent work has explored many design tradeoffs for networks-on-chip (NoCs) with novel router architectures to reduce hardware cost. In particular, recent work proposes bufferless deflection routing to eliminate router buffers. The high cost of...

chapter

CloudCache: Expanding and shrinking private caches

Hyunjin Lee, Sangyeun Cho, B R Childers

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 219 - 230

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

The number of cores in a single chip multiprocessor is expected to grow in coming years. Likewise, aggregate on-chip cache capacity is increasing fast and its effective utilization is becoming ever more important. Furthermore, available cores are expected to be underutilized due to the power wall and highly heterogeneous future workloads. This trend makes existing L2 cache management techniques less...

chapter

Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors

H R Ghasemi, S C Draper, Nam Sung Kim

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 38 - 49

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

To date dynamic voltage/frequency scaling (DVFS) has been one of the most successful power-reduction techniques. However, ever-increasing process variability reduces the reliability of static random access memory (SRAM) at low voltages. This limits voltage scaling to a minimum operating voltage (V_DDMIN). Larger SRAM cells, that are less sensitive to process variability, allow the use of lower V_DDMIN...

chapter

MOPED: Orchestrating interprocess message data on CMPs

Junli Gu, S S Lumetta, R Kumar, Yihe Sun

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 111 - 120

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

Future CMPs will combine many simple cores with deep cache hierarchies. With more cores, cache resources per core are fewer, and must be shared carefully to avoid poor utilization due to conflicts and pollution. Explicit motion of data in these architectures, such as message passing, can provide hints about program behavior that can be used to hide latency and improve cache behavior. However, to make...

chapter

A 32nm Westmere-EX Xeon^® enterprise processor

S Sawant, U Desai, G Shamanna, L Sharma, more

2011 IEEE International Solid-State Circuits Conference > 74 - 75

2011 IEEE International Solid- State Circuits Conference (ISSCC 2011)

The next-generation enterprise Xeon^® processor consists of 10 Westmere 32nm cores and a shared inclusive L3 cache (LLC) integrated on a monolith ic die, with link-based l/Os. This paper focuses on the innovations and circuit optimizations over the predecessor targeting idle power reduction, robust high-speed I/O links, and performance per watt improvements. The processor is implemented in 32nm CMOS...

chapter

A 32nm 3.1 billion transistor 12-wide-issue Itanium^® processor for mission-critical servers

R J Riedlinger, R Bhatia, L Biro, B Bowhill, more

2011 IEEE International Solid-State Circuits Conference > 84 - 86

2011 IEEE International Solid- State Circuits Conference (ISSCC 2011)

The next generation in the Intel® Itanium® processor family, code named Poulson, has eight multi-threaded 64 bit cores. Poulson is socket compatible with the current Intel® Itanium® Processor 9300 series (Tukwila) . The new design integrates a ring-based system interface derived from portions of previ ous Xeon® and Itanium® processors, and includes 32MB of Last Level Cache (LLC). The processor is...

INFONA - science communication portal

Search results

Characterizing multi-threaded applications based on shared-resource contention

Link-time optimization for power efficiency in a tagless instruction cache

Technology scaling: A system perspective

Custom FPGA-based micro-architecture for streaming computing

Embarrassingly scalable database systems

Communication on the Fly for Hierarchical Systems of Chip Multi-processors

A Software-Pipelined Approach to Multicore Execution of Timing Predictable Multi-threaded Hard Real-Time Tasks

A Time-Predictable Object Cache

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

Building efficient transactional memory support based on snoopy coherence

Maximizing throughput of temperature-constrained multi-core systems with 3D-stacked cache memory

Architectural support predicting method for CMP scheduling

Automatic Feedback Control of Shared Hybrid Caches in 3D Chip Multiprocessors

MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy

CHIPPER: A low-complexity bufferless deflection router

CloudCache: Expanding and shrinking private caches

Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors

MOPED: Orchestrating interprocess message data on CMPs

A 32nm Westmere-EX Xeon^® enterprise processor

A 32nm 3.1 billion transistor 12-wide-issue Itanium^® processor for mission-critical servers

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options