2011 International Conference on Parallel Architectures and Compilation Techniques

Task dataflow languages simplify the specification of parallel programs by dynamically detecting and enforcing dependencies between tasks. These languages are, however, often restricted to a single level of parallelism. This language design is reflected in the runtime system, where a master thread explicitly generates a task graph and worker threads execute ready tasks and wake-up their dependents...

chapter

No More Backstabbing... A Faithful Scheduling Policy for Multithreaded Programs

Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan

2011 International Conference on Parallel Architectures and Compilation Techniques > 12 - 21

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Efficient contention management is the key to achieving scalable performance for multithreaded applications running on multicore systems. However, contention management policies provided by modern operating systems increase context-switches and lead to performance degradation for multithreaded applications under high loads. Moreover, this problem is exacerbated by the interaction between contention...

chapter

Dynamic Fine-Grain Scheduling of Pipeline Parallelism

Daniel Sanchez, David Lo, Richard M. Yoo, Jeremy Sugerman, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 22 - 32

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Scheduling pipeline-parallel programs, defined as a graph of stages that communicate explicitly through queues, is challenging. When the application is regular and the underlying architecture can guarantee predictable execution times, several techniques exist to compute highly optimized static schedules. However, these schedules do not admit run-time load balancing, so variability introduced by the...

chapter

SPATL: Honey, I Shrunk the Coherence Directory

Hongzhou Zhao, Arrvindh Shriraman, Sandhya Dwarkadas, Vijayalakshmi Srinivasan

2011 International Conference on Parallel Architectures and Compilation Techniques > 33 - 44

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

One of the key scalability challenges of on-chip coherence in a multicore chip is the coherence directory, which provides information on sharing of cache blocks. Shadow tags that duplicate entire private cache tag arrays are widely used to minimize area overhead, but require an energy-intensive associative search to obtain the sharing information. Recent research proposed a Tagless directory, which...

chapter

POPS: Coherence Protocol Optimization for Both Private and Shared Data

Hemayet Hossain, Sandhya Dwarkadas, Michael C. Huang

2011 International Conference on Parallel Architectures and Compilation Techniques > 45 - 55

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

As the number of cores in a chip multiprocessor (CMP) increases, the need for larger on-chip caches also increases in order to avoid creating a bottleneck at the off-chip interconnect. Utilization of these CMPs include combinations of multithreading and multiprogramming, showing a range of sharing behavior, from frequent inter-thread communication to no communication. The goal of the CMP cache design...

chapter

An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence

Jun Lee, Jungwon Kim, Junghyun Kim, Sangmin Seo, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 56 - 67

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Recently, Intel has introduced a research prototype many core processor called the Single-chip Cloud Computer (SCC). The SCC is an experimental processor created by Intel Labs. It contains 48 cores in a single chip and each core has its own L1 and L2 caches without any hardware support for cache coherence. It allows maximum 64GB size of external memory that can be accessed by all cores and each core...

chapter

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries

Bin Ren, Gagan Agrawal

2011 International Conference on Parallel Architectures and Compilation Techniques > 68 - 77

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Programmer productivity considerations are increasing the popularity of interpreted languages like Python. At the same time, for applications where performance is important, these languages clearly lack even on uniprocessors. In addition, the use of dynamic data structures in a language like Python makes it very hard to use emerging libraries for enabling the execution on multi-core and many-core...

chapter

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Sungpack Hong, Tayo Oguntebi, Kunle Olukotun

2011 International Conference on Parallel Architectures and Compilation Techniques > 78 - 88

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Graphs are a fundamental data representation that has been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this paper, we present a new method for implementing the parallel BFS algorithm on multi-core CPUs which exploits a fundamental...

chapter

A Heterogeneous Parallel Framework for Domain-Specific Languages

Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 89 - 100

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming models and making decisions that can limit forward scalability. In previous work we proposed the use of domain-specific...

chapter

PEPSC: A Power-Efficient Processor for Scientific Computing

Ganesh Dasika, Ankit Sethia, Trevor Mudge, Scott Mahlke

2011 International Conference on Parallel Architectures and Compilation Techniques > 101 - 110

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. While these devices...

chapter

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling

Jungseob Lee, Vijay Sathisha, Michael Schulte, Katherine Compton, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 111 - 120

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak throughput of a GPU is proportional to the product of the number of cores and their frequency. However, the product is often limited by a power constraint. Although the throughput can be increased with more cores for...

Publication date

Set your own date range

Keywords

HARDWARE (20)
BENCHMARK TESTING (19)
MULTICORE PROCESSING (15)
INSTRUCTION SETS (13)
RUNTIME (12)
PARALLEL PROCESSING (11)
SYSTEM-ON-A-CHIP (11)
COHERENCE (10)
MEMORY MANAGEMENT (10)
OPTIMIZATION (10)
ARRAYS (9)
COMPUTER ARCHITECTURE (9)
GRAPHICS PROCESSING UNIT (9)
PROGRAMMING (9)
VECTORS (9)
EDUCATIONAL INSTITUTIONS (7)
PROTOCOLS (7)
SYNCHRONIZATION (7)
GPU (6)
KERNEL (6)
PROGRAM PROCESSORS (6)
RADIATION DETECTORS (6)
RANDOM ACCESS MEMORY (6)
REGISTERS (6)
RESOURCE MANAGEMENT (6)
COMPLEXITY THEORY (5)
INDEXES (5)
PARALLEL PROGRAMMING (5)
PREFETCHING (5)
SCALABILITY (5)
TILES (5)
BANDWIDTH (4)
COMPILER (4)
MICROARCHITECTURE (4)
PARALLEL ARCHITECTURES (4)
PROPOSALS (4)
SOFTWARE (4)
CACHE COHERENCE (3)
COMPUTERS (3)
DATA STRUCTURES (3)
DECODING (3)
ENERGY CONSUMPTION (3)
INTERFERENCE (3)
LAYOUT (3)
MANYCORE (3)
MICROPROCESSORS (3)
PERFORMANCE (3)
SEMANTICS (3)
SIMD (3)
THROUGHPUT (3)
USA COUNCILS (3)
ACCURACY (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
BIOINFORMATICS (2)
CLOCKS (2)
COMPUTATIONAL MODELING (2)
COMPUTER LANGUAGES (2)
CUDA (2)
DATA MODELS (2)
DOPED FIBER AMPLIFIERS (2)
DRAM (2)
DYNAMIC SCHEDULING (2)
FAULT TOLERANCE (2)
FAULT TOLERANT SYSTEMS (2)
GPGPU (2)
HARDWARE TRANSACTIONAL MEMORY (2)
HEURISTIC ALGORITHMS (2)
HISTORY (2)
LIBRARIES (2)
LOAD MODELING (2)
MATHEMATICAL MODEL (2)
MONITORING (2)
MULTICORE (2)
ORGANIZATIONS (2)
OUT OF ORDER (2)
PHASE CHANGE MATERIALS (2)
PIPELINES (2)
REDUNDANCY (2)
SCHEDULES (2)
SCHEDULING (2)
SHAPE (2)
TRANSACTIONAL MEMORY (2)
TRANSIENT ANALYSIS (2)
TRANSISTORS (2)
ADAPTATION MODELS (1)
ADDRESS MAPPING (1)
APPLICATION OF EMERGING MEMORY TECHNOLOGIES (1)
ARCHITECTURE (1)
AREA-EQUIVALENT HOMOGENEOUS MULTICORE (HMG) (1)
ARGON (1)
ART (1)
ASYMMETRIC MULTICORE PROCESSOR (AMP) (1)
AUTO-TUNING (1)
AUTOMATA (1)
AVX (1)
BANK LEVEL PARALLELISM (1)
BARS (1)
BFS (1)
BINARY OPTIMIZATION (1)
more

INFONA - science communication portal

2011 International Conference on Parallel Architectures and Compilation Techniques

Cover Art

Title Page i

Title Page iii

Copyright Page

Table of Contents

Message from the General Chair

Message from the Program Chair

Conference Committees

Reviewers

A Unified Scheduler for Recursive and Task Dataflow Parallelism

No More Backstabbing... A Faithful Scheduling Policy for Multithreaded Programs

Dynamic Fine-Grain Scheduling of Pipeline Parallelism

SPATL: Honey, I Shrunk the Coherence Directory

POPS: Coherence Protocol Optimization for Both Private and Shared Data

An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

A Heterogeneous Parallel Framework for Domain-Specific Languages

PEPSC: A Power-Efficient Processor for Scientific Computing

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling

Filter options

Publication date

Keywords

INFONA - science communication portal

2011 International Conference on Parallel Architectures and Compilation Techniques $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2011 International Conference on Parallel Architectures and Compilation Techniques