A motion estimation (ME) processor for H.264 encoder is implemented in 40nm CMOS. With algorithm and architecture co-optimization, its throughput reaches 1.59Gpixel/s for 7680×4320p 48fps video, at least 7.5 times faster than previous chips. Its core power dissipation is 622mW at 210MHz, with energy efficiency improved by 23%. DRAM bandwidth requirement is reduced by 68%. With a maximum search range of ±211 (horizontal) by ±106 (vertical) around a predictive search center, the proposed ME processor well accommodates the large motion of ultra-high-resolution video.