The design of H.264/AVC interpolation unit is very challenging for the high memory bandwidth and large calculation complexity caused by the new coding features of variable block size (VBS) and 6-tap filter. In this paper, a novel one-step interpolation implementation algorithm is proposed which can effectively reduce processing cycle because of its less memory accessing. Moreover, a data reuse scheme is used to save processing cycle and memory bandwidth. A high performance hardware architecture is implemented according to the methods mentioned above. As a result, 26% memory bandwidth reduction and 45% processing cycle reduction are achieved, which shows that our architecture is an efficient hardware accelerating solution and can be used in real-time encoder.