Tracking with multiple cameras with nonoverlapping fields of view is challenging due to the differences in appearance that objects typically have when seen from different cameras. In this paper we use a probabilistic approach to track people across multiple, sparsely distributed cameras, where an observation corresponds to a person walking through the field of view of a camera. Modelling appearance and spatio-temporal aspects probabilistically allows us to deal with the uncertainty but, to obtain good results, it is important to maximise the information content of the features we extract from the raw video images. Occlusions and ambiguities within an observation result in noise, thus making the inference less confident. In this paper, we propose to position stereo cameras on the ceiling, facing straight down, thus greatly reducing the possibility of occlusions. This positioning also leads to specific requirements of the algorithms for feature extraction, however. Here, we show that depth information can be used to solve ambiguities and extract meaningful features, resulting in significant improvements in tracking accuracy.