Search results

Items from 1 to 20 out of 44 results

chapter

Characterizing multi-threaded applications based on shared-resource contention

T Dey, Wei Wang, J W Davidson, M L Soffa

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 76 - 86

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

For higher processing and computing power, chip multiprocessors (CMPs) have become the new mainstream architecture. This shift to CMPs has created many challenges for fully utilizing the power of multiple execution cores. One of these challenges is managing contention for shared resources. Most of the recent research address contention for shared resources by single-threaded applications. However,...

chapter

Link-time optimization for power efficiency in a tagless instruction cache

T M Jones, S Bartolini, J Maebe, D Chanet

International Symposium on Code Generation and Optimization (CGO 2011) > 32 - 41

2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011)

The instruction cache is a critical component in any microprocessor. It must have high performance to enable fetching of instructions on every cycle. However, current designs waste a large amount of energy on each access as tags and data banks from all cache ways are consulted in parallel to fetch the correct instructions as quickly as possible. Existing approaches to reduce this overhead remove unnecessary...

chapter

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

Li Hao, Liu Tao, Liu Guanghui, Xie Lunguo

2011 3rd International Conference on Computer Research and Development > 4 > 254 - 259

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

When there are several application running on Chip-Multiprocessors (CMPs), it is a problem to allocate the on-chip cache capacities between these competing applications. Cache partitioning is commonly used to solve this problem. Existing cache partitioning schemes either dedicate to the shared design or partition the last level cache depending on limited memory information. This paper presents Private...

chapter

Building efficient transactional memory support based on snoopy coherence

Zhengbin Pang, Shaogang Wang, Dan Wu, Jun Zhang, more

2011 3rd International Conference on Computer Research and Development > 1 > 46 - 50

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

Transactional memory (TM) is a new shared resource synchronization mechanism which was proposed to ease the difficulty of parallel programming. Currently, most hardware transactional memory systems leverages the extended directory based cache coherence protocol to resolve transaction conflicts; seldom research has been conducted to extend a snoopy coherence based chip multi-processor with TM support...

chapter

CHIPPER: A low-complexity bufferless deflection router

C Fallin, C Craik, O Mutlu

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 144 - 155

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the interconnect becomes a significant factor in cost, energy consumption and performance. Recent work has explored many design tradeoffs for networks-on-chip (NoCs) with novel router architectures to reduce hardware cost. In particular, recent work proposes bufferless deflection routing to eliminate router buffers. The high cost of...

chapter

MOPED: Orchestrating interprocess message data on CMPs

Junli Gu, S S Lumetta, R Kumar, Yihe Sun

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 111 - 120

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

Future CMPs will combine many simple cores with deep cache hierarchies. With more cores, cache resources per core are fewer, and must be shared carefully to avoid poor utilization due to conflicts and pollution. Explicit motion of data in these architectures, such as message passing, can provide hints about program behavior that can be used to hide latency and improve cache behavior. However, to make...

chapter

ACCESS: Smart scheduling for asymmetric cache CMPs

Xiaowei Jiang, A Mishra, Li Zhao, R Iyer, more

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 527 - 538

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

In current Chip-multiprocessors (CMPs), a significant portion of the die is consumed by the last-level cache. Until recently, the balance of cache and core space has been primarily guided by the needs of single applications. However, as multiple applications or virtual machines (VMs) are consolidated on such a platform, researchers have observed that not all VMs or applications require significant...

chapter

Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors

Ying Wang, Lei Zhang, Yinhe Han, Huawei Li, more

2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing > 70 - 76

2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing (PRDC 2010)

Large scale Chip-Multiprocessors (CMPs) generally employ Network-on-Chip (NoC) to connect the last level cache (LLC), which is generally organized as distributed NUCA (non-uniform cache access) arrays for scalability and efficiency. On the other hand, aggressive technology scaling induces severe reliability problems, causing on-chip components (e.g., cores, cache banks, routers) failure due to manufacture...

chapter

Parichute: Generalized Turbocode-Based Error Correction for Near-Threshold Caches

T N Miller, R Thomas, J Dinan, B Adcock, more

2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture > 351 - 362

2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010)

Energy efficiency is a primary concern for microprocessor designers. A very effective approach to improving the energy efficiency of a chip is to lower its supply voltage to very close to the transitor's threshold voltage, into what is called the near-thresold region. This reduces power consumption dramatically but also decreases reliability by orders of magnitude, especially for SRAM structures such...

chapter

Open Source Precision Timed Soft Processor for Cyber Physical System Applications

S Craven, D Long, J Smith

2010 International Conference on Reconfigurable Computing and FPGAs > 448 - 451

2010 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2010)

Modern processor architectures sacrifice timing predictability to improve average performance. Branch prediction, out-of-order execution, and multi-level cache hierarchies complicate accurate execution time estimates. The timing demands of Cyber Physical Systems (CPS) have led some to propose new processor architectures, including Precision Timed (PRET) processors, which simplify analysis of execution...

chapter

Partitioning mechanism based on dynamic Allocation of Data entries for chip multiprocessors

Yan Pei-Xiang, Jiang Jiang, Yang Xian-Ju, Zhang Min-Xuan

5th International Conference on Computer Sciences and Convergence Information Technology > 472 - 479

2010 5th International Conference on Computer Sciences and Convergence Information Technology (ICCIT 2010)

Exploiting the locality of blocks in the same set, LRU replacement strategy has deficiencies to manage L2 cache resources as the temporal locality has filtered by L1 caches. Instead, reuse replacement strategy develops the reuse characteristics of blocks in entire cache scope being more potential to improve cache resources utilization. We use reuse replacement to manage L2 cache resources in chip...

chapter

Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses

A Sandberg, D Eklöv, E Hagersten

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Contention for shared cache resources has been recognized as a major bottleneck for multicores--especially for mixed workloads of independent applications. While most modern processors implement instructions to manage caches, these instructions are largely unused due to a lack of understanding of how to best leverage them. This paper introduces a classification of applications into four cache usage...

chapter

Insertion policy selection using Decision Tree Analysis

S Khan, D A Jimenez

2010 IEEE International Conference on Computer Design > 106 - 111

2010 IEEE International Conference on Computer Design (ICCD 2010)

The last-level cache (LLC) mitigates the impact of long memory access latencies in today's microarchitectures. The insertion policy in the LLC has a significant impact on cache efficiency. A fixed insertion policy can allow useless blocks to remain in the cache longer than necessary, resulting in inefficiency. We introduce insertion policy selection using Decision Tree Analysis (DTA). The technique...

chapter

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

J Treibig, G Hager, G Wellein

2010 39th International Conference on Parallel Processing Workshops > 207 - 216

2010 39th International Conference on Parallel Processing Workshops (ICPPW)

Exploiting the performance of today's processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter...

chapter

Composite Pseudo-Associative Cache for Mobile Processors

L D Bobbala, J Salvatierra, Byeong Kil Lee

2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems > 394 - 396

18th IEEE/ACM International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS 2010)

Multi-core trends are becoming dominant, creating sophisticated and complicated cache structures. Also, the bigger shared level-2 (L2) caches are demanded for higher cache performance. One of the easiest ways to design cache memory for increased performance is to double the cache size. However, the big cache size is directly related to the area and power consumption. Especially in mobile processors,...

chapter

Dynamic Fair Cache Partitioning for Chip Multiprocessor

Juan Fang, Jiang Pu

2010 Third International Joint Conference on Computational Science and Optimization > 2 > 283 - 287

Third International Joint Conference on Computational Sciences and Optimization (CSO 2010)

Fairness is a critical issue because of some serious problems, such as thread starvation and priority inversion, it can arise and render the Operating System (OS) scheduler ineffective if no fair cache sharing which provided by the hardware. In order to improve the fairness of shared cache between threads in a chip multiprocessor, a dynamic fair partitioning policy of shared cache is proposed in this...

chapter

Balanced locality-aware packet schedule algrorithm on multi-core network processor

Pengcheng He, Jinlin Wang, Haojiang Deng, Wu Zhang

2010 2nd International Conference on Future Computer and Communication > 3 > V3-248 - V3-252

2010 2nd International Conference on Future Computer and Communication (ICFCC 2010)

Previous work has shown that processor affinity is one effective way to improve the performance of SMP systems. A detailed analysis of cache characteristics on network processor is carried out. The result shows that network packet processing can also use cache affinity to gain performance improvements by reduce the miss rate of instruction cache and data cache. A schedule algorithm for multi-core...

chapter

Towards Smaller-Sized Cache for Mobile Processors Using Shared Set-Associativity

Naveen Davanam, Byeong Kil Lee

2010 Seventh International Conference on Information Technology: New Generations > 1 - 6

Seventh International Conference on Information Technology: New Generations (ITNG 2010)

As multi-core trends are becoming dominant, cache structures are complicated and bigger shared level-2 caches are demanded. Also, in mobile processors, multi-core design is being applied. To achieve higher cache performance, lower power consumption and smaller chip area in multi-core mobile processors, cache configuration should be re-organized and re-analyzed. The MID (Mobile Internet Devices) which...

chapter

Structuring the execution of OpenMP applications for multicore architectures

Francois Broquedis, Olivier Aumage, Brice Goglin, Samuel Thibault, more

2010 IEEE International Symposium on Parallel&Distributed Processing (IPDPS) > 1 - 10

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user friendliness of shared memory on the one side, and memory access scalability and efficiency on the other side. However, to get high performance out of such machines requires a dynamic mapping of application tasks and data onto the underlying...

chapter

A reconfigurable cache memory with heterogeneous banks

Domingo Benitez, Juan C Moure, Dolores Rexachs, Emilio Luque

2010 Design, Automation&Test in Europe Conference&Exhibition (DATE 2010) > 825 - 830

2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)

The optimal size of a large on-chip cache can be different for different programs: at some point, the reduction of cache misses achieved when increasing cache size hits diminishing returns, while the higher cache latency hurts performance. This paper presents the Amorphous Cache (AC), a reconfigurable L2 on-chip cache aimed at improving performance as well as reducing energy consumption. AC is composed...

Keywords:
MICROPROCESSOR CHIPS
CACHE STORAGE
HARDWARE

Publication date

Set your own date range

Keywords

COMPUTER ARCHITECTURE (17)
BENCHMARK TESTING (14)
MULTIPROCESSING SYSTEMS (12)
PROGRAM PROCESSORS (12)
RADIATION DETECTORS (8)
CHIP MULTIPROCESSORS (7)
DATA MINING (7)
MULTI-THREADING (7)
REGISTERS (7)
COHERENCE (6)
MEMORY ARCHITECTURE (6)
PREFETCHING (6)
SHARED MEMORY SYSTEMS (6)
ARRAYS (5)
INDEXES (5)
MULTICORE PROCESSING (5)
PROCESSOR SCHEDULING (5)
RESOURCE ALLOCATION (5)
SYSTEM-ON-A-CHIP (5)
YARN (5)
CHIP MULTIPROCESSOR (4)
EMBEDDED SYSTEMS (4)
HISTORY (4)
INSTRUCTION SETS (4)
INTEGRATED CIRCUIT DESIGN (4)
L2 CACHE (4)
LINUX (4)
PIPELINES (4)
POWER CONSUMPTION (4)
POWER DEMAND (4)
SOFTWARE (4)
TILES (4)
LOGIC GATES (3)
LOW-POWER ELECTRONICS (3)
MAGNETIC CORES (3)
MEMORY MANAGEMENT (3)
MULTI-CORE (3)
OPTIMIZATION (3)
PARALLEL ARCHITECTURES (3)
PERFORMANCE EVALUATION (3)
PREDICTION ALGORITHMS (3)
PROTOCOLS (3)
SOCKETS (3)
ACCURACY (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
APPLICATION PROGRAM INTERFACES (2)
BANDWIDTH (2)
CACHE (2)
CACHE MEMORY (2)
CACHE PERFORMANCE (2)
CACHE TOPOLOGY (2)
CHIP-MULTIPROCESSOR (2)
CHIP-MULTIPROCESSORS (2)
CMP (2)
COMPLEXITY THEORY (2)
CONTENT-ADDRESSABLE STORAGE (2)
DATA CACHE (2)
DATA STRUCTURES (2)
DRAM (2)
DRAM CHIPS (2)
DYNAMIC SCHEDULING (2)
EMBEDDED SYSTEM (2)
ENERGY CONSUMPTION (2)
ENERGY EFFICIENCY (2)
ENGINES (2)
EQUATIONS (2)
FAIRNESS (2)
INTERRUPTS (2)
KERNEL (2)
LOAD BALANCING (2)
MATHEMATICAL MODEL (2)
MOBILE COMMUNICATION (2)
MULTICORE CHIPS (2)
NETWORK-ON-CHIP (2)
OPERATING SYSTEM (2)
OPERATING SYSTEMS (COMPUTERS) (2)
PARALLEL PROCESSING (2)
PARTITIONING ALGORITHMS (2)
POST-SILICON VALIDATION (2)
POWER CONSUMPTION REDUCTION (2)
POWER EFFICIENCY (2)
REPLACEMENT STRATEGY (2)
RESOURCE MANAGEMENT (2)
SCHEDULES (2)
SERVERS (2)
STRESS (2)
SYNCHRONIZATION (2)
TOPOLOGY (2)
VIRTUAL MACHINES (2)
1MB 16 WAY SET ASSOCIATIVE LAST LEVEL CACHE (1)
4-CORE PROCESSOR SYSTEM (1)
A-DE (1)
ACCELERATION (1)
ACCESS (1)
ACCESS LATENCIES (1)
ADAPTATION MODEL (1)
more

INFONA - science communication portal

Search results

Characterizing multi-threaded applications based on shared-resource contention

Link-time optimization for power efficiency in a tagless instruction cache

Private cache partitioning: A method to reduce the off-chip missrate of concurrently executing applications in Chip-Multiprocessors

Building efficient transactional memory support based on snoopy coherence

CHIPPER: A low-complexity bufferless deflection router

MOPED: Orchestrating interprocess message data on CMPs

ACCESS: Smart scheduling for asymmetric cache CMPs

Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors

Parichute: Generalized Turbocode-Based Error Correction for Near-Threshold Caches

Open Source Precision Timed Soft Processor for Cyber Physical System Applications

Partitioning mechanism based on dynamic Allocation of Data entries for chip multiprocessors

Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses

Insertion policy selection using Decision Tree Analysis

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

Composite Pseudo-Associative Cache for Mobile Processors

Dynamic Fair Cache Partitioning for Chip Multiprocessor

Balanced locality-aware packet schedule algrorithm on multi-core network processor

Towards Smaller-Sized Cache for Mobile Processors Using Shared Set-Associativity

Structuring the execution of OpenMP applications for multicore architectures

A reconfigurable cache memory with heterogeneous banks

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options