In this paper, we study and analyze the computational complexity of deblocking filter in H.264/AVC baseline decoder based on SimpleScalar/ARM simulator. The simulation result shows that the memory reference, content activity check operations and filter operations are known to be very time consuming in the decoder of this new video coding standard. In order to improve overall system performance, we propose a window processing approach with efficient VLSI architecture which simultaneously processes the horizontal filtering of vertical edge and vertical filtering of horizontal edge. As a result, the memory performance of proposed architecture is improved by 4 times when compared to software implementation. Moreover, the system performance of our proposal significantly outperforms the previous designs as shown in result section