Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

A hardware/software co-designed processor transparently supports a ubiquitous ISA (e.g. ×86) with diversified and innovative microarchitectural implementations. It leverages co-designed HW features and dynamic binary translation (DBT) SW to morph existing binary programs to scale performance and save power. On such systems, the portable bytecode of modern dynamic languages (e.g. Java, JavaScript,...

chapter

A polynomial spilling heuristic: Layered allocation

Boubacar Diouf, Albert Cohen, Fabrice Rastello

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Register allocation is one of the most important, and one of the oldest compiler optimizations. It aims to map temporary variables to machine registers, and defaults to explicit load/store from memory when necessary. The latter option is referred to as spilling. This paper addresses the minimization of the spill code overhead, one of the difficult problems in register allocation. We devised a heuristic,...

chapter

Bandwidth Bandit: Quantitative characterization of memory contention

David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

On multicore processors, co-executing applications compete for shared resources, such as cache capacity and memory bandwidth. This leads to suboptimal resource allocation and can cause substantial performance loss, which makes it important to effectively manage these shared resources. This, however, requires insights into how the applications are impacted by such resource sharing. While there are...

chapter

SIMD parallelization of applications that traverse irregular data structures

Bin Ren, Gagan Agrawal, James R. Larus, Todd Mytkowicz, more

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Fine-grained data parallelism is increasingly common in mainstream processors in the form of longer vectors and on-chip GPUs. This paper develops support for exploiting such data parallelism for a class of non-numeric, non-graphic applications, which perform computations while traversing many independent, irregular data structures. While the traversal of any one irregular data structure does not give...

chapter

Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs

Junjie Lai, Andre Seznec

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

In this paper, we present an approach to estimate GPU applications' performance upper bound based on algorithm analysis and assembly code level benchmarking. As an example, we analyze the potential peak performance of SGEMM (Single-precision General Matrix Multiply) on Fermi (GF110) and Kepler (GK104) GPUs. We try to answer the question of how much optimization space is left for SGEMM and why. According...

chapter

Instant profiling: Instrumentation sampling for profiling datacenter applications

Hyoun Kyu Cho, Tipp Moseley, Richard Hank, Derek Bruening, more

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Profile-guided optimization possesses huge potential to save costs for datacenters. Hardware performance monitoring units enable profiling with negligible overhead and they have been proven to be effective to help programmers find code regions to optimize by monitoring datacenter applications continuously on live traffic. However, these hardware features are inflexible and often buggy, limiting the...

chapter

Effective fault localization based on minimum debugging frontier set

Feng Li, Wei Huo, Congming Chen, Lujie Zhong, more

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

In this paper, we present a novel state-based fault-localization approach called DelFal. Assuming the availability of the execution trace which leads to the reported program execution failure, this new approach successively selects sets of trace points to allow the performance of efficient automatic explorations on program execution states in order to help the developer locate programming faults responsible...

chapter

Portable mapping of data parallel programs to OpenCL for heterogeneous systems

Dominik Grewe, Zheng Wang, Michael F. P. O'Boyle

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high levellanguage...

chapter

Hydra: Automatic algorithm exploration from linear algebra equations

Alexandre X. Duchateau, David Padua, Denis Barthou

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Hydra accepts an equation written in terms of operations on matrices and automatically produces highly efficient code to solve these equations. Processing of the equation starts by tiling the matrices. This transforms the equation into either a single new equation containing terms involving tiles or into multiple equations some of which can be solved in parallel with each other. Hydra continues transforming...

chapter

Smart, adaptive mapping of parallelism in the presence of external workload

Murali Krishna Emani, Zheng Wang, Michael F. P. O'Boyle

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Given the wide scale adoption of multi-cores in main stream computing, parallel programs rarely execute in isolation and have to share the platform with other applications that compete for resources. If the external workload is not considered when mapping a program, it leads to a significant drop in performance. This paper describes an automatic approach that combines compile-time knowledge of the...

chapter

Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms

Jie Yan, Guangming Tan, Xiuxia Zhang, Erlin Yao, more

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

For graph traversal applications, fine synchronization is required to exploit massive fine parallelism. However, in the conventional solution using fine-grained locks, locks themselves suffer huge memory cost as well as poor locality for inherent irregular access to vertices. In this paper, we propose a novel fine lock solution-vLock. The key idea is lock virtualization that maps the huge logical...

chapter

On the platform specificity of STM instrumentation mechanisms

Wenjia Ruan, Yujie Liu, Chao Wang, Michael Spear

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 10

2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Supporting atomic blocks (e.g., Transactional Memory (TM)) can have far-reaching effects on language design and implementation. While much is known about the language-level semantics of TM and the performance of algorithms for implementing TM, little is known about how platform characteristics affect the manner in which a compiler should instrument code to achieve efficient transactional behavior...

Publication date

Set your own date range

Keywords

CUDA (2)
JAVASCRIPT (2)
STATIC ANALYSIS (2)
ATOMICITY VIOLATIONS (1)
AUTOMATIC DERIVATION (1)
BANDWIDTH (1)
CACHE SHARING (1)
CACHES (1)
CHANGE IMPACT ANALYSIS (1)
CODE (1)
COMPILATION (1)
COMPILER (1)
COMPILERS (1)
CONCURRENT PROGRAMS (1)
CORE (1)
CRITICAL SECTION (1)
DATA RACES (1)
DATACENTERS (1)
DEBUGGING (1)
DEPENDENCE VIOLATION (1)
DRIVEN OPTIMIZATIONS (1)
DYNAMIC DEPENDENCE GRAPH (1)
DYNAMIC MONITORING (1)
ENERGY (1)
FAULT LOCALIZATION (1)
FERMI GPU (1)
FINE GRAINED PARALLELISM (1)
FINE SYNCHRONIZATION (1)
GPU (1)
GPU, OPENCL, MACHINE (1)
GRAPH ALGORITHMS (1)
HEAP CLONING (1)
HIERARCHICAL CRITICAL PATH ANALYSIS (1)
INSTRUMENTATION (1)
INTEGER OVERFLOW (1)
IRREGULAR DATA STRUCTURE (1)
JIT (1)
KEPLER GPU (1)
LEARNING MAPPING (1)
LINEAR ALGEBRA (1)
LOCAL STORAGE (1)
LOOP TILING (1)
LOOP TRANSFORMATION, CACHE HIERARCHY, MULTI (1)
MACHINE LEARNING (1)
MEMORY (1)
MEMORY HIERARCHY (1)
MEMORY LEAK (1)
MEMORY PROFILING (1)
MINIMUM DEBUGGING FRONTIER SET (1)
MULTICORE (1)
NOP INSERTION (1)
OPTIMIZATION (1)
PARALLELISM MAPPING (1)
PERFORMANCE (1)
PERFORMANCE UPPER BOUND ANALYSIS (1)
POINTER ANALYSIS (1)
POLY (1)
PROFILE (1)
PROFILE MIGRATION (1)
PROFILING (1)
PROFILING, AUTOMATED SOFTWARE DIVERSITY, COMPILERS, COLD (1)
PROGRAM ANALYSIS (1)
PROGRAM PARALLELIZATION (1)
RANGE ANALYSIS (1)
REGISTER ALLOCATION (1)
RUNTIME ADAPTATION (1)
SCALARIZATION (1)
SCOPIC ANALYSIS (1)
SGEMM (1)
SIMD (1)
SOFTWARE UPDATE (1)
SPECULATION (1)
STRIDED ACCESS (1)
TRANSACTIONAL MEMORY, ARM, RELAXED MEMORY CONSISTENCY, THREAD (1)
VECTOR SHADOW MEMORY (1)
VLOCK (1)
more

INFONA - science communication portal

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Organizing committee

External reviewers

Message from the general chairs

Table of content

[Front cover]

[Front cover]

[Copyright notice]

Contributors

Acceldroid: Co-designed acceleration of Android bytecode

A polynomial spilling heuristic: Layered allocation

Bandwidth Bandit: Quantitative characterization of memory contention

SIMD parallelization of applications that traverse irregular data structures

Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs

Instant profiling: Instrumentation sampling for profiling datacenter applications

Effective fault localization based on minimum debugging frontier set

Portable mapping of data parallel programs to OpenCL for heterogeneous systems

Hydra: Automatic algorithm exploration from linear algebra equations

Smart, adaptive mapping of parallelism in the presence of external workload

Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms

On the platform specificity of STM instrumentation mechanisms

Filter options

Publication date

Keywords

INFONA - science communication portal

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)