Search results

Items from 21 to 40 out of 44 results

chapter

Cache streamization for high performance stream processor

Nan Wu, Mei Wen, Ju Ren, Yi He, more

2009 International Conference on High Performance Computing (HiPC) > 140 - 149

2009 16th International Conference on High Performance Computing (HiPC)

Due to high bandwidth demand on memory system of stream applications, most of stream processors use software-managed streaming memory. However, this memory disadvantages ease of programming, compatibility, and supporting irregular stream access, which hinder the usage of stream processor in broader application domains. Meanwhile, hardware-managed coherent caches overcome these shortcomings of software-managed...

chapter

Understanding the applicability of CMP performance optimizations on data mining applications

I. Jibaja, K.A. Shaw

2009 IEEE International Symposium on Workload Characterization (IISWC) > 227 - 236

2009 IEEE International Symposium on Workload Characterization (IISWC)

A major challenge to the creation of chip multiprocessors is designing the on-chip memory and communication resources to efficiently support parallel workloads. A variety of cache organizations, data management techniques, and hardware optimizations that take advantage of specific data characteristics have been developed to improve application performance. The success of these approaches depends on...

chapter

Cache Sharing Management for Performance Fairness in Chip Multiprocessors

Xing Zhou, Wenguang Chen, Weimin Zheng

2009 18th International Conference on Parallel Architectures and Compilation Techniques > 384 - 393

2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT 2009)

Resource sharing can cause unfair and unpredictable performance of concurrently executing applications in Chip-Multiprocessors (CMP). The shared last-level cache is one of the most important shared resources because off-chip request latency may take a significant part of total execution cycles for data intensive applications. Instead of enforcing performance fairness directly, prior work addressing...

chapter

ITCA: Inter-task Conflict-Aware CPU Accounting for CMPs

C. Luque, M. Moreto, F.J. Cazorla, R. Gioiosa, more

2009 18th International Conference on Parallel Architectures and Compilation Techniques > 203 - 213

2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT 2009)

Chip-multiprocessor (CMP) architectures are becoming more and more popular as an alternative to the traditional processors that only extract instruction-level parallelism from an application. CMPs introduce complexities when accounting CPU utilization. This is due to the fact that the progress done by an application during an interval of time highly depends on the activity of the other applications...

chapter

An Effective Replacement Strategy of Cache Memory for an SMT Processor

Y. Ogasawara, H. Nakajo

2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools > 19 - 25

2009 12th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools (DSD 2009)

An SMT processor is designed to execute multiple threads simultaneously in order to gain higher performance with sharing resources such as ALUs and cache memory among several threads. However, sharing cache memory may cause thread conflict misses which degrades its performance. In this paper, an effective replacement strategy in which conflicts miss ratio among threads is controlled by limiting the...

chapter

GPU Accelerated Solver of Time-Dependent Air Pollutant Transport Equations

V. Simek, R. Dvorak, F. Zboril, V. Drabek

2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools > 707 - 713

2009 12th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools (DSD 2009)

Main objective of this paper is to outline possible ways how to achieve a substantial acceleration in case of advection-diffusion equation (A-DE) calculation, which is commonly used for a description of the pollutant behavior in atmosphere. A-DE is a land of partial differential equation (PDE) and in general case it is usually solved by numerical integration due to its high complexity. These types...

chapter

Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor

Fenglong Song, Zhiyong Liu, Dongrui Fan, He Huang, more

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications > 571 - 576

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

On-chip many core architecture is an emerging and promising computation platform. High speed on-chip communication and abundant chipped resources are two outstanding advantages of this architecture, which provide an opportunity to implement efficient synchronization scheme. The practical execution efficiency of synchronization scheme is critical to this platform. However, there are few researches...

chapter

On Prediction Accuracy of Machine Learning Algorithms for Characterizing Shared L2 Cache Behavior of Programs on Multicore Processors

J.K. Rai, A. Negi, R. Wankar, K.D. Nayak

2009 First International Conference on Computational Intelligence, Communication Systems and Networks > 213 - 219

2009 First International Conference on Computational Intelligence, Communication Systems and Networks (CICSYN)

Information on a particular behavioral aspect of a program can be useful to know about the performance bottlenecks and can be utilized further to improve the performance of the system. It is observed that contention for shared L2 cache between programs running on a multi-core processor (MCP) is one of the performance bottlenecks. The utilization of the L2 cache by a program, while sharing it with...

chapter

A low power and variable-length FFT processor design for flexible MIMO OFDM systems

Chun-Lung Hung, Syu-Siang Long, Muh-Tian Shiue

2009 IEEE International Symposium on Circuits and Systems > 705 - 708

2009 IEEE International Symposium on Circuits and Systems - ISCAS 2009

In this paper, we present a low power and variable-length design of fast Fourier transform (FFT) processor for flexible MIMO-OFDM applications. In this work, mixed-radix-2/4/8 algorithm and new continuous-flow method are applied to achieve variable-length of 1K/2K/4K/8K points and in-order output. Furthermore, ping-pong cache memory architecture and optimized data scaling strategy are also applied...

chapter

Shared Memory Cache Organizations for Reconfigurable Computing Systems

P. Garcia, K. Compton

2009 17th IEEE Symposium on Field Programmable Custom Computing Machines > 239 - 242

2009 17th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM 2009)

The best interface between CPUs and reconfigurable hardware in heterogeneous systems remains an open question. The trend in multi-core processors is to communicate through a shared memory hierarchy; but cache organizations that work best for general-purpose multi-core systems may not be best for heterogeneous systems. In this paper we explore a variety of cache topologies for connecting a CPU with...

chapter

Dacota: Post-silicon validation of the memory subsystem in multi-core designs

A. DeOrio, I. Wagner, V. Bertacco

2009 IEEE 15th International Symposium on High Performance Computer Architecture > 405 - 416

HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture

The number of functional errors escaping design verification and being released into final silicon is growing, due to the increasing complexity and shrinking production schedules of modern processor designs. Recent trends towards chip multiprocessors (CMPs) are exacerbating the problem because of their complex and sometimes non-deterministic memory subsystems, prone to subtle but devastating bugs...

chapter

Design and implementation of software-managed caches for multicores with local memory

Sangmin Seo, Jaejin Lee, Z. Sura

2009 IEEE 15th International Symposium on High Performance Computer Architecture > 55 - 66

HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture

Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies from different types of memory accesses add overhead and adversely affect instruction scheduling. Instead, the accelerator cores have internal local memory to place their code and data. Programmers of such heterogeneous multicore...

chapter

Online cache state dumping for processor debug

A. Vishnoi, P.R. Panda, M. Balakrishnan

2009 46th ACM/IEEE Design Automation Conference > 358 - 363

2009 46th ACM/IEEE Design Automation Conference (DAC)

Post-silicon processor debugging is frequently carried out in a loop consisting of several iterations of the following two key steps: (i) processor execution for some duration, followed by (ii) dumping out of the processor's internal state into an external logic analyzer for further offline processing. Internal state of the processor is dominated by the L2 cache. During the process of dumping the...

chapter

Architecting a chunk-based memory race recorder in Modern CMPs

G. Pokam, C. Pereira, K. Danne, R. Kassa, more

2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 576 - 586

2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009)

Prior work on HW support for memory race recording piggybacks time stamps on coherence messages and logs the outcome of memory races using point-to-point or chunk-based approaches. These memory race recorder (MRR) techniques are effective, but they require modifications to the cache coherence protocol that can hurt performance. In addition, prior work has mostly focused on directory coherence and...

chapter

A lightweight memory encryption cache design and implementation for embedded processor

Zhenglin Liu, Wenjie Huo, Xuecheng Zou, Yingyan Lin

Proceedings of the 2009 12th International Symposium on Integrated Circuits > 57 - 60

2009 12th International Symposium on Integrated Circuits (ISIC 2009)

Memory encryption offers a secure protection for the confidentiality of program and data. But implementing an encryption design for embedded processor is much difficult. As the embedded processor is highly constrained by the application requirement, the designers can't only concern with security. This paper proposes a new lightweight memory encryption cache (MEC) to obtain a balance among the performance,...

chapter

Characterizing the resource-sharing levels in the UltraSPARC T2 processor

V. Cakarevic, P. Radojkovic, J. Verdu, A. Pajuelo, more

2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 481 - 492

2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009)

Thread level parallelism (TLP) has become a popular trend to improve processor performance, overcoming the limitations of extracting instruction level parallelism. Each TLP paradigm, such as Simultaneous Multithreading or Chip-Multiprocessors, provides different benefits, which has motivated processor vendors to combine several TLP paradigms in each chip design. Even if most of these combined-TLP...

chapter

The Design of Way-Prediction Scheme in Set-Associative Cache for Energy Efficient Embedded System

Chia-Ying Tseng, Hsin-Chu Chen

2009 WRI International Conference on Communications and Mobile Computing > 3 > 3 - 7

2009 WRI International Conference on Communications and Mobile Computing. CMC 2009

Embedded system develops rapidly, functions turn into more complicate, and multi-media applications are growing daily and they consume more electrical power. Therefore, how to improve stand-by time will become a very important issue. Related researches indicate that the power consumption of processor cache is accounted for a big proportion. Way-prediction and LRU (least recently used) algorithms improve...

chapter

Instruction prefetching using Basicblock prediction

K. Shyamala, P. Ravibabu, S.K. Lokhande, R. Reddy, more

2008 International Conference on Electronic Design > 1 - 4

2008 International Conference on Electronic Design. ICED 2008

Memory latency is a significant bottleneck in modern computer architectures, especially for commercial and multimedia applications. Instruction cache misses can severely limit the performance, due to advent of superscalar processors and multicore systems. Prefetching is one of the promising method to bridge the performance gap between CPU and DRAM speed. Although Instruction prefetching is a promising...

chapter

Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Haiming Liu, M. Ferdman, Jaehyuk Huh, D. Burger

2008 41st IEEE/ACM International Symposium on Microarchitecture > 222 - 233

2008 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41)

Data caches in general-purpose microprocessors often contain mostly dead blocks and are thus used inefficiently. To improve cache efficiency, dead blocks should be identified and evicted early. Prior schemes predict the death of a block immediately after it is accessed; however, these schemes yield lower prediction accuracy and coverage. Instead, we find that predicting the death of a block when it...

chapter

Implications of cache asymmetry on server consolidation performance

P. Apparao, R. Iyer, D. Newell

2008 IEEE International Symposium on Workload Characterization > 24 - 32

2008 IEEE International Symposium on Workload Characterization (IISWC)

Todaypsilas CMP platforms are designed to be symmetric in terms of platform resources such as shared caches. However, it is becoming increasingly important to understand the performance implications of asymmetric caches for two key reasons: (a) multi-workload scenarios such as server consolidation are a growing trend and contention for shared cache resources between workloads causes logical cache...

Keywords:
MICROPROCESSOR CHIPS
CACHE STORAGE
HARDWARE

Publication date

Set your own date range

Keywords

COMPUTER ARCHITECTURE (17)
BENCHMARK TESTING (14)
MULTIPROCESSING SYSTEMS (12)
PROGRAM PROCESSORS (12)
RADIATION DETECTORS (8)
CHIP MULTIPROCESSORS (7)
DATA MINING (7)
MULTI-THREADING (7)
REGISTERS (7)
COHERENCE (6)
MEMORY ARCHITECTURE (6)
PREFETCHING (6)
SHARED MEMORY SYSTEMS (6)
ARRAYS (5)
INDEXES (5)
MULTICORE PROCESSING (5)
PROCESSOR SCHEDULING (5)
RESOURCE ALLOCATION (5)
SYSTEM-ON-A-CHIP (5)
YARN (5)
CHIP MULTIPROCESSOR (4)
EMBEDDED SYSTEMS (4)
HISTORY (4)
INSTRUCTION SETS (4)
INTEGRATED CIRCUIT DESIGN (4)
L2 CACHE (4)
LINUX (4)
PIPELINES (4)
POWER CONSUMPTION (4)
POWER DEMAND (4)
SOFTWARE (4)
TILES (4)
LOGIC GATES (3)
LOW-POWER ELECTRONICS (3)
MAGNETIC CORES (3)
MEMORY MANAGEMENT (3)
MULTI-CORE (3)
OPTIMIZATION (3)
PARALLEL ARCHITECTURES (3)
PERFORMANCE EVALUATION (3)
PREDICTION ALGORITHMS (3)
PROTOCOLS (3)
SOCKETS (3)
ACCURACY (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
APPLICATION PROGRAM INTERFACES (2)
BANDWIDTH (2)
CACHE (2)
CACHE MEMORY (2)
CACHE PERFORMANCE (2)
CACHE TOPOLOGY (2)
CHIP-MULTIPROCESSOR (2)
CHIP-MULTIPROCESSORS (2)
CMP (2)
COMPLEXITY THEORY (2)
CONTENT-ADDRESSABLE STORAGE (2)
DATA CACHE (2)
DATA STRUCTURES (2)
DRAM (2)
DRAM CHIPS (2)
DYNAMIC SCHEDULING (2)
EMBEDDED SYSTEM (2)
ENERGY CONSUMPTION (2)
ENERGY EFFICIENCY (2)
ENGINES (2)
EQUATIONS (2)
FAIRNESS (2)
INTERRUPTS (2)
KERNEL (2)
LOAD BALANCING (2)
MATHEMATICAL MODEL (2)
MOBILE COMMUNICATION (2)
MULTICORE CHIPS (2)
NETWORK-ON-CHIP (2)
OPERATING SYSTEM (2)
OPERATING SYSTEMS (COMPUTERS) (2)
PARALLEL PROCESSING (2)
PARTITIONING ALGORITHMS (2)
POST-SILICON VALIDATION (2)
POWER CONSUMPTION REDUCTION (2)
POWER EFFICIENCY (2)
REPLACEMENT STRATEGY (2)
RESOURCE MANAGEMENT (2)
SCHEDULES (2)
SERVERS (2)
STRESS (2)
SYNCHRONIZATION (2)
TOPOLOGY (2)
VIRTUAL MACHINES (2)
1MB 16 WAY SET ASSOCIATIVE LAST LEVEL CACHE (1)
4-CORE PROCESSOR SYSTEM (1)
A-DE (1)
ACCELERATION (1)
ACCESS (1)
ACCESS LATENCIES (1)
ADAPTATION MODEL (1)
more

INFONA - science communication portal

Search results

Cache streamization for high performance stream processor

Understanding the applicability of CMP performance optimizations on data mining applications

Cache Sharing Management for Performance Fairness in Chip Multiprocessors

ITCA: Inter-task Conflict-Aware CPU Accounting for CMPs

An Effective Replacement Strategy of Cache Memory for an SMT Processor

GPU Accelerated Solver of Time-Dependent Air Pollutant Transport Equations

Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor

On Prediction Accuracy of Machine Learning Algorithms for Characterizing Shared L2 Cache Behavior of Programs on Multicore Processors

A low power and variable-length FFT processor design for flexible MIMO OFDM systems

Shared Memory Cache Organizations for Reconfigurable Computing Systems

Dacota: Post-silicon validation of the memory subsystem in multi-core designs

Design and implementation of software-managed caches for multicores with local memory

Online cache state dumping for processor debug

Architecting a chunk-based memory race recorder in Modern CMPs

A lightweight memory encryption cache design and implementation for embedded processor

Characterizing the resource-sharing levels in the UltraSPARC T2 processor

The Design of Way-Prediction Scheme in Set-Associative Cache for Energy Efficient Embedded System

Instruction prefetching using Basicblock prediction

Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Implications of cache asymmetry on server consolidation performance

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options