Mike Mantor

chapter

Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing

Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, more

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) > 449 - 450

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)

chapter

Shared memory multiplexing: A novel way to improve GPGPU throughput

Yi Yang, Ping Xiang, Mike Mantor, Norm Rubin, more

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) > 283 - 292

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)

On-chip shared memory (a.k.a. local data share) is a critical resource to many GPGPU applications. In current GPUs, the shared memory is allocated when a thread block (also called a workgroup) is dispatched to a streaming multiprocessor (SM) and is released when the thread block is completed. As a result, the limited capacity of shared memory becomes a bottleneck for a GPU to host a high number of...

chapter

A model-driven approach to warp/thread-block level GPU cache bypassing

Hongwen Dai, Chao Li, Huiyang Zhou, Saurabh Gupta, more

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

The high amount of memory requests from massive threads may easily cause cache contention and cache-miss-related resource congestion on GPUs. This paper proposes a simple yet effective performance model to estimate the impact of cache contention and resource congestion as a function of the number of warps/thread blocks (TBs) to bypass the cache. Then we design a hardware-based dynamic warp/thread-block...

chapter

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture

Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, more

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 121 - 130

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Many-core architectures such as graphics processing units (GPUs) rely on thread-level parallelism (TLP)to overcome pipeline hazards. Consequently, each core in a many-core processor employs a relatively simple in-order pipeline with limited capability to exploit instruction-level parallelism (ILP). In this paper, we study the ILP impact on the throughput-oriented many-core architecture, including...

chapter

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

Yi Yang, Ping Xiang, Mike Mantor, Huiyang Zhou

2012 41st International Conference on Parallel Processing > 329 - 339

2012 41st International Conference on Parallel Processing (ICPP)

Given the extraordinary computational power of modern graphics processing units (GPUs), general purpose computation on GPUs (GPGPU) has become an increasingly important platform for high performance computing. To better understand how well the GPU resource has been utilized by application developers and then to facilitate them to develop high performance GPGPU code, we conduct an empirical study on...

chapter

AMD Radeon™ HD 7970 with graphics core next (GCN) architecture

Mike Mantor

2012 IEEE Hot Chips 24 Symposium (HCS) > 1 - 35

2012 IEEE Hot Chips 24 Symposium (HCS)

This article consists of a collection of slides from the author's conference presentation on AMD's Radeon HD 7970. Some of the specific topics discussed include: AMD products' core architecture; multimedia and display system specifications; GCN architectures that support multiple product configurations; VLIW SIMD versus GCN quad SIMD processing comparisons; and GCN architecture design.

chapter

CPU-assisted GPGPU on fused CPU-GPU architectures

Yi Yang, Ping Xiang, Mike Mantor, Huiyang Zhou

IEEE International Symposium on High-Performance Comp Architecture > 1 - 12

2012 IEEE 18th International Symposium on High Performance Computer Architecture (HPCA)

This paper presents a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs on fused CPU-GPU architectures. In our model of fused architectures, the GPU and the CPU are integrated on the same die and share the on-chip L3 cache and off-chip memory, similar to the latest Intel Sandy Bridge and AMD accelerated processing unit (APU) platforms. In our proposed CPU-assisted...

chapter

2007 Hot Chips 19 AMD's Radeon™ HD 2900

Mike Mantor

2007 IEEE Hot Chips 19 Symposium (HCS) > 1 - 13

2007 IEEE Hot Chips 19 Symposium (HCS)

This article consists of a collection of slides from the author's conference presentation. Some of the specific areas/topics discussed include: AMD Radeon HD™ 2900 Highlights.

INFONA - science communication portal

Search results for: Mike Mantor

Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing

Shared memory multiplexing: A novel way to improve GPGPU throughput

A model-driven approach to warp/thread-block level GPU cache bypassing

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

AMD Radeon™ HD 7970 with graphics core next (GCN) architecture

CPU-assisted GPGPU on fused CPU-GPU architectures

2007 Hot Chips 19 AMD's Radeon™ HD 2900

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Mike Mantor

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options