In this paper, we present a method to extract moving objects in monocular image sequences. The proposed method is based on graph cuts defined on a spatio-temporal region adjacency graph (RAG). First, we initially over-segment each frame in the video, and take the over-segmented regions as the vertices in the 3D spatio-temporal graph. Second, multiple cues are fused together to extract objects accurately. Finally, accurate foreground/background segmentation are efficiently achieved by binary graph cut. The experimental results showed that the proposed method improved the performance of segmentation with respect to the popular methods.