In this paper, a new spatio-temporal saliency model is presented. Based on the idea that both spatial and temporal features are needed to determine the saliency of a video, this model builds upon the fact that locally contrasted and globally rare features are salient. The features used in the model are both spatial (color and orientations) and temporal (motion amplitude and direction) at several scales. To be more robust to moving camera a module computes the global motion and to be more consistent in time, the saliency maps are combined together after a temporal filtering. The model is evaluated on a dataset of 24 videos split into 5 categories (Abnormal, Surveillance, Crowds, Moving camera, and Noisy). This model achieves better performance when compared to several state-of-the-art saliency models.