With the development of science and technology, the technique of detecting regions of interest (ROI) plays an increasingly role in the field of the image analysis and processing. Human vision system actively seeks interesting regions in images to reduce the search export in tasks, such as object detection and recognition. Similarly, prominent actions in video sequences are more likely to attract human's first sight than their surrounding neighbors. Based on the mechanism of HVS, this paper proposes a model of the focus of attention for detecting the attended regions in video sequences. It uses the similarity between the adjacent frames, establishes the gray histogram, selects the maximum similarity as predicable model, and gets position of the focus of attention in the next fame. For dynamic scene the model can pay more attention on moving targets and meets people's attention mechanism. The experimental results indicate that this model is effective in robustness and real-time.