Modern operating system kernels, such as Linux, address the trade-off between portability and performance by exposing a generic interface to user space programs, while maintaining architecture-dependent functionality as a set of separate components inside the kernel space. In particular, performance can only be achieved by ensuring that the architecture-dependent code takes advantage of the facilities offered by the underlying hardware. In turn, architecture-dependent optimization requires tools to observe the system's behaviour and evaluate its performance. In this paper we analyze the state-of-the art performance measurement frameworks for the Linux kernel. We describe our experience with said frameworks in a scenario comprising a Linux kernel running Para virtualized on top of an L4 micro kernel, and assess their suitability for our scenario by using them to uncover a set of particular performance issues. In this respect we demonstrate their effectiveness through a detailed evaluation and propose an approach to generalize their usage for evaluating architecture-dependent performance.