This paper proposes a novel processing order and an efficient architecture for real-time implementation of the deblocking filter in H.264/AVC video coding standard. The process of the deblocking filter causes the intensive requirement of data and computations and increases the execution time of both encoding and decoding. The proposed processing order, double-cross processing order, is effectively constructed by a parallel flow to improve processing speed and reduce memory access. Moreover, the proposed architecture can save about 38-80% of memory access as compared with other designs. Based on this high efficient architecture, the processing performance can be enhanced, and the operation frequency for standardized video specifications can be reduced. For the general video specification HDTV1080p (1920 ?? 1080 @30 fps), the operation frequency of the proposed architecture is only 11.5 MHz. For the high resolution QFHD specification (3840 ?? 2160 @30 fps), the operation frequency of the proposed architecture is only 46.6 MHz. The implementation result is about 20.14 K gates, and the memory requirement is 64 ?? 32 bits. The power dissipation for QFHD specification is 7.7 mW at 46.6 MHz operating frequency.