The advent of the modern GPU architecture has enabled computers to use General Purpose GPU capabilities (GPGPU) to tackle large scale problem at a low computational cost. This technological innovation is also available on mobile devices, addressing one of the primary problems with recent devices: the power envelope. Unfortunately, recent mobile GPUs suffer from a lack of accuracy that can prevent them from running any large scale data analysis tasks, such as principal component analysis (Shlens, 0000) (PCA). The goal of our work is to address this limitation by combining the high precision available on a CPU with the power efficiency of a mobile GPU. In this paper, we exploit the shared memory architecture of mobile devices in order to enhance the CPU–GPU collaboration and speed up PCA computation without sacrificing precision. Experimental results suggest that such an approach drastically reduces the power consumption of the mobile device while accelerating the overall workload. More generally, we claim that this approach can be extended to accelerate other vectorized computations on mobile devices while still maintaining numerical accuracy.