In this paper, we investigate the use of field-programmable gate arrays (FPGAs) in the design of a highly scalable variable block size motion estimation architecture for the H.264/AVC video encoding standard. The scalability of the architecture allows one to incorporate the system into low cost single FPGA solutions for low-resolution video encoding applications as well as into high performance multi-FPGA solutions targeting high-resolution applications. To overcome the performance gap between FPGAs and application specific integrated circuits, our algorithm intelligently increases its parallelism as the design scales while minimizing the use of memory bandwidth. The core computing unit of the architecture is implemented on FPGAs and its performance is reported. It is shown that the computing unit is able to achieve 28 frames per second (fps) performance for 640x480 resolution VGA video while incurring only 4% device utilization on a Xilinx XC5VLX330 FPGA. With 8 computing units at 37% device utilization, the architecture is able to achieve 31 fps performance for encoding full 1920x1088 progressive HDTV video.