In this paper, we propose a spatio-temporal dependencies learning (STDL) method for action recognition. Inspired by self-organizing map, our method can learn implicit spatial-temporal dependencies from sequential action feature sets while preserving the intrinsic topologies characterized in human actions. A further advantage is its ability to project higher dimensional action feature to lower dimensional latent neural distribution, which significantly reduces the computational cost and data redundancy in the learning and recognition process. An ensemble learning strategy using expectation-maximization is adopted to estimate the latent parameters of STDL model. The effectiveness and robustness of the proposed model is verified through extensive experiments on several benchmark datasets.