Three-dimensional video (3DV) consists of multi-view video and multi-view depth video, which provides three-dimensional perception and makes people more interested in depth contrast and pop-out regions. Meanwhile, 3DV is with both high temporal and inter-view correlation. In this paper, we define a novel depth perceptual region of interest (ROI) for 3DV and propose two joint extraction schemes according to correlation types of 3DV. Then, depth based ROI extraction is proposed by jointly using depth, motion and texture information. Furthermore, we also present a novel inter-view tracking method for 3DV, in which inter-view correlation among views and extracted ROI of neighboring views are utilized to facilitate ROI extraction among different views. Experimental results show that the proposed ROI extraction and tracking algorithms maintain high extraction accuracy and low complexity.