In motion estimation, fast algorithms usually lead to an irregular searching flow, and the power reduction on architecture level is constrained for poor data reuse (DR). In this paper, a parallel IME hardware for H.264/AVC is proposed to well combine the techniques on algorithm and architecture levels. The "2-D SAD Tree" is adopted to support intra- and inter-candidate DR for the content-adaptive parallel-VBS four step search algorithm. A ladder-shaped reference data arrangement is proposed to support DR in both horizontal and vertical directions, while an advanced searching flow is applied to reduce the latency cycles. After these two techniques, 77.6% power of search window SRAMs can be reduced. According to the implementation result, in ultra low power mode, only 1.424 mW is required for realtime encoding CIF 30 fps videos with 13.5 MHz operation frequency