Performance drawbacks for matrix multiplication using set associative cache in GPU devices

Leonid Djinevski; Sime Arsenovski; Sasko Ristov; Marjan Gusev

Performance drawbacks for matrix multiplication using set associative cache in GPU devices

Djinevski, Leonid, Arsenovski, Sime, Ristov, Sasko, Gusev, Marjan

Source

2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 193 - 198

Abstract

Performance of shared memory processors show negative performance impulses (drawbacks) in certain regions for execution of the basic matrix multiplication algorithm. In this paper we continue with analysis of GPU memory hierarchy and corresponding cache memory organization. We give a theoretical analysis why a negative performance impulse appears for specifics problem sizes. The main reason is the cache storage organization, i.e. the negative performance peak appears caused by mapping of matrix elements onto one cache set, instead of using the whole cache. The obtained experimental results prove our theoretical analysis. We also propose a method to avoid situations where performance drawbacks appear.