This paper proposes a flexible and efficient implementation of the 2D $N$ -point discrete cosine transform (DCT) for the High Efficiency Video Coding (HEVC) standard. The DCT is implemented through the Walsh–Hadamard transform (WHT) followed by Givens rotations. This scheme is exploited to derive an adaptive algorithm, which allows computing of four different approximations ranging from the complete DCT to the WHT, by selectively skipping some rotations. This paper shows the statistical analysis of the DCT usage and derives a precomputation mechanism to adaptively skip rotations. Each approximation, referred to as a operating mode, is characterized by a large saving of operations, at the expense of very small quality loss. Then, two 2D-DCT architectures are proposed: the first one is totally unfolded, while the second one is folded. The two designs are finally synthesized with a 90-nm standard-cell library for a clock frequency of 250 MHz. Both architectures support real-time processing of 8K UHD video sequences at 64 and 26 fps, respectively, and show higher throughput and lower gate count compared with the state-of-art implementations. Moreover, power saving ranging from 28% to 56% can be achieved by working within the proposed operating modes.