Rethinking Prefetching in GPGPUs: Exploiting Unique Opportunities

Ahmad Lashgar; Amirali Baniasadi

doi:10.1109/HPCC-CSS-ICESS.2015.145

Rethinking Prefetching in GPGPUs: Exploiting Unique Opportunities

Source

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 72 - 77

Abstract

In this paper we investigate static memory access predictability in GPGPU workloads, at the thread block granularity. We first show that a significant share of accessed memory addresses can be predicted using thread block identifiers. We build on this observation and introduce a hardware-software prefetching scheme to reduce average memory access time. Our proposed scheme issues the memory requests of thread block before it starts execution. The scheme relies on static analyzer to parse the kernel and find predictable memory accesses. Runtime API calls pass this information to the hardware. Hardware dynamically prefetches the data of each thread block based on this information. In our scheme, prefetch accuracy is controlled by software (static analyzer and API calls) and hardware controls the prefetching timeliness. We introduce few machine models to explore the design space and performance potential behind the scheme. Our evaluation shows that the scheme can achieve a performance improvement of 59% over the baseline without prefetching.