Rate control (RC) techniques play an important role for interactive video coding applications, especially in video streaming applications with bandwidth constraints. Among the RC algorithms in H.264 reference software JM, the basic unit (BU)-level RC algorithm achieves better video quality than frame-level one. However, the inherent sequential processing in H.264 BU-level RC algorithm makes it difficult to be realized in a pipelined H.264 hardware encoder without increasing the processing latency. In this paper we propose a new H.264 BU-level rate control algorithm and the associated architecture by exploiting a new predictor model to predict the MAD value and target bits for hardware realization. The proposed algorithm breaks down the sequential processing dependence in the original H.264 RC algorithm and reduces up to 80.6% of internal buffer size for H.264 D1 video encoding, while maintaining good video quality.