In this paper, we present a novel matching framework for stereo video sequence based on disparity prediction by stereo-motion fusion. Fusion-based methods tackles the problem by integrating different depth modalities in cooperative way, so as to resolve matching ambiguities from each modality and acquire more accurate disparity map. Unfortunately, only few works consider processing with video sequences, which appears commonly in practical applications of stereo vision. The proposed method attempts to achieve high-quality disparity map, while maintaining temporal consistency by disparity prediction process which tracks features of disparity map. Experimental results show that our fusion algorithm quantitatively and qualitatively outperforms previous fusion-based matching methods.