Statistical performance of the ARM cortex A9 accelerator coherency port in the xilinx zynq SoC for real-time applications

Andrew Powell; Dennis Silage

doi:10.1109/ReConFig.2015.7393362

Statistical performance of the ARM cortex A9 accelerator coherency port in the xilinx zynq SoC for real-time applications

Source

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

Abstract

Using the Xilinx Zynq SoC, this work extends previous work by analysing and quantifying the effects of various outer (L2) caching behaviors and memory ordering models on memory accesses from a hardware accelerator (HA) implemented in programmable logic (PL) and from one of two ARM Cortex A9 CPUs. Memory accesses to the L2 cache/external memory and onchip memory (OCM) are both considered. The HA is configured to perform either coherent or non-coherent memory accesses through the accelerator coherency port (ACP), using full AXI4 transactions with 256 64-bit word burst sizes. The L1 caches of the CPU are configured with write-back/no-write-allocate for all normal memory ordering operations. The effects of a dummy task executing on the CPU is considered in this work. Performance is measured as the turnaround time of memory accesses, for which writes and reads are measured separately. The numerical results are presented as standard deviation, maximum, and mean values for real-time applications. All experiments are executed on the Avent ZedBoard for 4,000 iterations with 64 KB data payload. Memory accesses to either OCM or external memory from either CPU or ACP are shown to have similar performance, but only under specific behaviors of the memory hierarchy. It is also shown memory whose ordering model is configured as device can hold several advantages over normal and strongly-ordered models.