2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Items from 1 to 20 out of 55 results

chapter

Can lock-free and combining techniques co-exist? A novel approach on concurrent queue

Changwoo Min, Young Ik Eom

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 403

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Concurrent queues are one of the most fundamental concurrent data structures. Most previous research focuses on how to avoid the contended hot spots, Head and Tail, and there are two contradictory approaches: (1) lock-free techniques [1], [2], which increase the degree of parallelism to improve performance and (2) combining techniques [3], where a single combining thread performs a batch operation...

chapter

Automatic OpenCL work-group size selection for multicore CPUs

Sangmin Seo, Jun Lee, Gangwon Jo, Jaejin Lee

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 387 - 397

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

In this paper, we address the effect of the work-group size on the performance of OpenCL kernels. We propose a profiling-based algorithm that finds a good work-group size, in terms of performance, for the target multicore CPU architecture. Our algorithm reduces misses in the private L1 data cache and achieves load balancing between cores. It exploits the polyhedral model to estimate the working-set...

chapter

Automatic vectorization of tree traversals

Youngjoon Jo, Michael Goldfarb, Milind Kulkarni

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 363 - 374

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Repeated tree traversals are ubiquitous in many domains such as scientific simulation, data mining and graphics. Modern commodity processors support SIMD instructions, and using these instructions to process multiple traversals at once has the potential to provide substantial performance improvements. Unfortunately these algorithms often feature highly diverging traversals which inhibit efficient...

chapter

[Copyright notice]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > ii

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

chapter

[Front cover]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > i

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Presents the front cover or splash screen of the proceedings record.

chapter

Keynote talk: Towards automatic resource management in parallel architectures

Per Stenstrom

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 5

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

As we have embarked on the multi/many-core roadmap, resource management, especially managing parallelism, is left in the hands of programmers. A major challenge moving forward is how to off-load programmers from the daunting task of managing hardware resources in future parallel architectures to meet higher demands on performance and power efficiency. In this talk I will focus on a number of emerging...

chapter

Keynote talk: Parallel programming for mobile computing

Calin Cascaval

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 3

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Personal computing is going mobile and applications are changing to adapt to take advantage of new opportunities offered by permanent availability and connectivity. Mobile devices are a significant departure from traditional computing. On one hand, they are very personal, always on, always connected. They promise to fulfill the promise of being the hub for our digital lives. On the other hand, they...

chapter

Task sampling: Computer architecture simulation in the many-core era

Thomas Grass

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 405

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Chip Multi-Processors (CMPs) are evolving towards ever increasing core counts. Task-based programming models are a promising candidate for exploiting the parallelism offered by these machines. Simulation, the prevailing design methodology in computer architecture, is prohibitively time consuming, when it comes to CMPs featuring 1000s of cores. Sampled simulation [1], [2] is a standard technique for...

chapter

Managing shared last-level cache in a heterogeneous multicore processor

Vineeth Mekkat, Anup Holey, Pen-Chung Yew, Antonia Zhai

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 225 - 234

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as GPU cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is one of the most important shared resources due to its impact on performance. Accesses to the shared LLC in heterogeneous multicore processors can be dominated by the GPU due...

chapter

Memory-centric system interconnect design with Hybrid Memory Cubes

Gwangsun Kim, John Kim, Jung Ho Ahn, Jaeha Kim

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 145 - 155

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Memory bandwidth has been one of the most critical system performance bottlenecks. As a result, the HMC (Hybrid Memory Cube) has recently been proposed to improve DRAM bandwidth as well as energy efficiency. In this paper, we explore different system interconnect designs with HMCs. We show that processor-centric network architectures cannot fully utilize processor bandwidth across different traffic...

chapter

A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

Sandeep Navada, Niket K. Choudhary, Salil V. Wadhavkar, Eric Rotenberg

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 133 - 144

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

A single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction-level behavior and hence cannot be ranked - different program phases achieve their highest performance on different...

chapter

Fairness-aware scheduling on single-ISA heterogeneous multi-cores

Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, more

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 177 - 187

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Single-ISA heterogeneous multi-cores consisting of small (e.g., in-order) and big (e.g., out-of-order) cores dramatically improve energy- and power-efficiency by scheduling workloads on the most appropriate core type. A significant body of recent work has focused on improving system throughput through scheduling. However, none of the prior work has looked into fairness. Yet, guaranteeing that all...

chapter

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

Onur Kayiran, Adwait Jog, Mahmut T. Kandemir, Chita R. Das

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 157 - 166

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

General-purpose graphics processing units (GPG-PUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes of HPC applications. To facilitate such high TLP, emerging programming models like CUDA and OpenCL allow programmers to create work abstractions in terms of smaller work units, called cooperative thread arrays (CTAs). CTAs are...

chapter

Reshaping cache misses to improve row-buffer locality in multicore systems

Wei Ding, Jun Liu, Mahmut Kandemir, Mary Jane Irwin

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 235 - 244

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently...

chapter

DANBI: Dynamic scheduling of irregular stream programs for many-core systems

Changwoo Min, Young Ik Eom

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 189 - 200

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

The stream programming model has received a lot of interest because it naturally exposes task, data, and pipeline parallelism. However, most prior work has focused on static scheduling of regular stream programs. Therefore, irregular applications cannot be handled in static scheduling, and the load imbalance caused by static scheduling faces scalability limitations in many-core systems. In this paper,...

chapter

Jigsaw: Scalable software-defined caches

Nathan Beckmann, Daniel Sanchez

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 213 - 224

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from interference in shared cache accesses. Unfortunately, prior research addressing one issue either ignores or worsens the other: NUCA techniques reduce access...

chapter

SMT-centric power-aware thread placement in chip multiprocessors

Augusto Vega, Alper Buyuktosunoglu, Pradip Bose

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 167 - 176

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

In Simultaneous Multi-Threading (SMT) chip multiprocessors (CMPs), thread placement is performed today in a largely power-unaware manner. For example, consolidation of active threads into fewer cores exposes opportunities for power savings that have not been addressed in prior work. The savings opportunity is especially high in the emerging context where percore power gating (PCPG) is becoming viable...

chapter

ThermOS: System support for dynamic thermal management of chip multi-processors

Filippo Sironi, Martina Maggio, Riccardo Cattaneo, Giovanni F. Del Nero, more

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 41 - 50

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Constraining the temperature of computing systems has become a dominant aspect in the design of integrated circuits. The supply voltage decrease has lost its pace even though the feature size is shrinking constantly. This results in an increased number of transistors per unit of area and hence a growing power density. Researchers started investigating dynamic thermal management techniques to address...

chapter

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design

Bin Wang, Bo Wu, Dong Li, Xipeng Shen, more

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 93 - 102

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Hybrid memory designs, such as DRAM plus Phase Change Memory (PCM), have shown some promise for alleviating power and density issues faced by traditional memory systems. But previous studies have concentrated on CPU systems with a modest level of parallelism. This work studies the problem in a massively parallel setting. Specifically, it investigates the special implications to hybrid memory imposed...

chapter

Coordinated power-performance optimization in manycores

Hiroshi Sasaki, Satoshi Imamura, Koji Inoue

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 51 - 61

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Optimizing the performance in multiprogrammed environments, especially for workloads composed of multi-threaded programs is a desired feature of runtime management system in future manycore processors. At the same time, power capping capability is required in order to improve the reliability of microprocessor chips while reducing the costs of power supply and thermal budgeting. This paper presents...

Publication date

Set your own date range

Content availability

Available (54)
None (1)

Keywords

GPU (3)
MULTI-CORE (3)
CACHE (2)
CMP (2)
COMPILER OPTIMIZATION (2)
ENERGY EFFICIENCY (2)
GPGPU (2)
IRREGULAR PROGRAMS (2)
MEMORY BANDWIDTH (2)
NETWORK-ON-CHIP (2)
OPENCL (2)
PARTITIONING (2)
SCALABILITY (2)
SCHEDULING (2)
SIMD (2)
SPECULATION (2)
ACCELERATORS (1)
ACCESS EXECUTE PROGRAM DEPENDENCE GRAPH (1)
ADAPTIVE PROCESSOR (1)
ALGORITHMS (1)
AMORPHOUS DATA-PARALLELISM (1)
ANALYTIC MODEL (1)
ASYMMETRIC MULTICORE PROCESSOR (AMP) (1)
ATLAS (1)
AUTO-TUNING (1)
AUTOMATIC SELECTION (1)
AUTOMATIC VECTORIZATION (1)
BANDWIDTH-AWARE SCHEDULING (1)
BUFFER OVERFLOW (1)
CACHE MANAGEMENT POLICY (1)
CHIP MULTI-PROCESSORS (1)
CHIP MULTIPROCESSORS (1)
CMPS (1)
CO-DESIGN (1)
COHERENCE (1)
COHERENCE PROTOCOL (1)
COLLABORATION (1)
COMMUNICATION OPTIMIZATION (1)
COMPILER (1)
CORE CUSTOMIZATION (1)
CRITICAL LOAD ELIMINATION (1)
CRITICAL SECTIONS (1)
CROSS-CORE PERFORMANCE INTERFERENCE (1)
DARK SILICON (1)
DATA LOCALITY (1)
DATA MOVEMENT (1)
DATA PARALLEL (1)
DATA TRANSFORMATION (1)
DATAFLOW (1)
DECISION TREE (1)
DESIGN SPACE EXPLORATION (1)
DIRECTORY (1)
DISTRIBUTED MEMORY (1)
DRAM (1)
DTM (1)
DVFS (1)
DYNAMIC THERMAL MANAGEMENT (1)
DYNAMIC THREAD SCHEDULING (1)
DYSER (1)
FAIRNESS-AWARE SCHEDULING (1)
FLOW-SENSITIVE POINTER ANALYSIS (1)
GPGPUS (1)
GPU MEMORY MANGEMENT (1)
GRAPH-REWRITING (1)
HARDWARE PERFORMANCE COUNTERS (HPCS) (1)
HETEROGENEOUS ARCHITECTURES (1)
HETEROGENEOUS MULTI-CORE (1)
HETEROGENEOUS MULTI-CORE PROCESSOR (1)
HETEROGENEOUS MULTICORES (1)
HETEROGENEOUS SYSTEM (1)
HIGH-LEVEL PROGRAM ANALYSIS (1)
HIGH-LEVEL SYNTHESIS (1)
HYBRID MEMORY CUBE (1)
I/O (1)
IFKO (1)
INSTRUCTION LEVEL PARALLELISM (1)
INSTRUCTION-LEVEL PARALLELISM (1)
INTERMEDIATE REPRESENTATION (1)
INTERPROCEDURAL OPTIMIZATION (1)
ISOLATION (1)
ITERATIVE COMPILATION (1)
LOAD BALANCING (1)
MACHINE (1)
MANYCORE PROCESSOR (1)
MEMORY (1)
MEMORY ACCESS MONITORING (1)
MEMORY BANDWIDTH WALL (1)
MEMORY CORRUPTION (1)
MEMORY NETWORK (1)
MEMORY PREFETCHING (1)
MEMORY SUBSYSTEMS (1)
MEMORY SYSTEM (1)
MOBILE GPU (1)
MULTICAST (1)
MULTICORE (1)
MULTICORE CPU (1)
MULTICORE PROCESSORS (1)
NOC (1)
NON-CRITICAL CODE MOTION (1)
NUCA (1)
more

INFONA - science communication portal

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Can lock-free and combining techniques co-exist? A novel approach on concurrent queue

Automatic OpenCL work-group size selection for multicore CPUs

Automatic vectorization of tree traversals

[Copyright notice]

[Front cover]

Keynote talk: Towards automatic resource management in parallel architectures

Keynote talk: Parallel programming for mobile computing

Task sampling: Computer architecture simulation in the many-core era

Managing shared last-level cache in a heterogeneous multicore processor

Memory-centric system interconnect design with Hybrid Memory Cubes

A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

Fairness-aware scheduling on single-ISA heterogeneous multi-cores

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

Reshaping cache misses to improve row-buffer locality in multicore systems

DANBI: Dynamic scheduling of irregular stream programs for many-core systems

Jigsaw: Scalable software-defined caches

SMT-centric power-aware thread placement in chip multiprocessors

ThermOS: System support for dynamic thermal management of chip multi-processors

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design

Coordinated power-performance optimization in manycores

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)