Norm Rubin

chapter

Moka: Model-based concurrent kernel analysis

Leiming Yu, Xun Gong, Yifan Sun, Qianqian Fang, more

2017 IEEE International Symposium on Workload Characterization (IISWC) > 197 - 206

2017 IEEE International Symposium on Workload Characterization (IISWC)

GPUs continue to increase the number of compute resources with each new generation. Many data-parallel applications have been re-engineered to leverage the thousands of cores on the GPU. But not every kernel can fully utilize all the resources available. Many applications contain multiple kernels that could potentially be run concurrently. To better utilize the massive resources on the GPU, device...

chapter

GPU evolution: Will graphics morph into compute?

Norm Rubin

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 1

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

chapter

Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing

Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, more

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) > 449 - 450

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)

chapter

Shared memory multiplexing: A novel way to improve GPGPU throughput

Yi Yang, Ping Xiang, Mike Mantor, Norm Rubin, more

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) > 283 - 292

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)

On-chip shared memory (a.k.a. local data share) is a critical resource to many GPGPU applications. In current GPUs, the shared memory is allocated when a thread block (also called a workgroup) is dispatched to a streaming multiprocessor (SM) and is released when the thread block is completed. As a result, the limited capacity of shared memory becomes a bottleneck for a GPU to host a high number of...

chapter

LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs

Jin Wang, Norm Rubin, Albert Sidelnik, Sudhakar Yalamanchili

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) > 583 - 595

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)

Recent developments in GPU execution models and architectures have introduced dynamic parallelism to facilitate the execution of irregular applications where control flow and memory behavior can be unstructured, time-varying, and hierarchical. The changes brought about by this extension to the traditional bulk synchronous parallel (BSP) model also creates new challenges in exploiting the current GPU...

chapter

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture

Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, more

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 121 - 130

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Many-core architectures such as graphics processing units (GPUs) rely on thread-level parallelism (TLP)to overcome pipeline hazards. Consequently, each core in a many-core processor employs a relatively simple in-order pipeline with limited capability to exploit instruction-level parallelism (ILP). In this paper, we study the ILP impact on the throughput-oriented many-core architecture, including...

chapter

Dynamic Thread Block Launch: A lightweight execution mechanism to support irregular applications on GPUs

Jin Wang, Norm Rubin, Albert Sidelnik, Sudhakar Yalamanchili

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) > 528 - 540

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

GPUs have been proven effective for structured applications that map well to the rigid 1D-3D grid of threads in modern bulk synchronous parallel (BSP) programming languages. However, less success has been encountered in mapping data intensive irregular applications such as graph analytics, relational databases, and machine learning. Recently introduced nested device-side kernel launching functionality...

INFONA - science communication portal

Search results for: Norm Rubin

Moka: Model-based concurrent kernel analysis

GPU evolution: Will graphics morph into compute?

Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing

Shared memory multiplexing: A novel way to improve GPGPU throughput

LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture

Dynamic Thread Block Launch: A lightweight execution mechanism to support irregular applications on GPUs

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Norm Rubin

Moka: Model-based concurrent kernel analysis

GPU evolution: Will graphics morph into compute?

Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing

Shared memory multiplexing: A novel way to improve GPGPU throughput

LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture

Dynamic Thread Block Launch: A lightweight execution mechanism to support irregular applications on GPUs

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options