In this paper, a fast motion estimation algorithm, which is friendly to VLSI hardware implementation is proposed. This algorithm has such features: First, through "Haar" low-pass filter based subsampling, the computation complexity at each search position is reduced to about 25% of the original algorithm; Second, one modified motion vector prediction is provided to eliminate the data dependence among sub-partitions in the same macro block (MB). Based on this approach, parallel processing for variable block size motion estimation (VBSME) with integer pixel accuracy can be realized; Third, one "adaptive sub-search window" scheme is proposed to further reduce computation cost and it also can facilitate reference frame data reusing to reduce memory transfer from the external RAM to the on-chip SRAM. The proposed VBSME algorithm is very suitable for parallel VLSI implementation