CAMs as synchronizing caches for multithreaded irregular applications on FPGAs

Skyler Windh; Prerna Budhkar; Walid A. Najjar

doi:10.1109/ICCAD.2015.7372588

CAMs as synchronizing caches for multithreaded irregular applications on FPGAs

Windh, Skyler, Budhkar, Prerna, Najjar, Walid A.

Source

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 331 - 336

Abstract

Irregular applications, by their very nature, suffer from poor data locality. This often results in high miss rates for caches, and many long waits to off-chip memory. Historically, long latencies have been dealt with in two ways: (1) latency mitigation using large cache hierarchies, or (2) latency masking where threads relinquish their control after issuing a memory request. Multithreaded CPUs are designed for a fixed maximum number of threads tailored for an average application. FPGAs, however, can be customized to specific applications. Their massive parallelism is well known, and ideally suited to dynamically manage hundreds, or thousands of threads. Multithreading, in essence, trades memory bandwidth for latency. Therefore, to achieve a high throughput, the system must support a large memory bandwidth. Many irregular application, however, must rely on inter-thread synchronization for parallel execution. In-memory synchronization suffers from very long memory latencies. In this paper we describe the use of CAMs (Content Addressable Memories) as synchronizing caches for hardware multithreading. We demonstrate and evaluate this mechanism using graph breadth-first search (BFS).