The high compression efficiency that is provided by the high efficiency video coding (HEVC) standard comes at the cost of a significant increase of the computational load at the decoder. Such an increased burden is a limiting factor to accomplish real-time decoding, specially for high definition video sequences (e.g., Ultra HD 4K). In this scenario, a highly parallel HEVC decoder for the state-of-the-art graphics processor units (GPUs) is presented, i.e., GHEVC. Contrasting to our previous contributions, the data-parallel GHEVC decoder integrates the whole decompression pipeline (except for the entropy decoding), both for intra- and interframes. Furthermore, its processing efficiency was highly optimized by keeping the decompressed frames in the GPU memory for subsequent inter frame prediction. The proposed GHEVC decoder is fully compliant with the HEVC standard, where explicit synchronization points ensure the correct HEVC module execution order. Moreover, the GPU-based HEVC decoder is experimentally evaluated for different GPU devices, an extensive range of recommended HEVC configurations and video sequences, where an average frame rate of 145, 318, and 605 frames per second for Ultra HD 4K, WQXGA, and Full HD, respectively, was obtained in the Random Access configuration with the NVIDIA GeForce GTX TITAN X GPU.