This paper deals with the selection of relevant motion within a scene. The proposed method is based on 3D features extraction and their rarity quantification to compute bottom-up saliency maps. We show that the use of 3D motion features namely the motion direction and velocity is able to achieve much better results than the same algorithm using only 2D information. This is especially true in close scenes with small groups of people or moving objects and frontal view. The proposed algorithm uses motion features but it can be easily generalized to other dynamic or static features. It is implemented on a platform for real-time signal analysis called Max/Msp/Jitter. Social signal processing, video games, gesture processing and, in general, higher level scene understanding can benefit from this method.