This paper presents a view-invariant approach to gait recognition in multi-camera scenarios exploiting a joint spatio-temporal data representation and analysis. First, multi-view information is employed to generate a 3D voxel reconstruction of the scene under study. The analyzed subject is tracked and its centroid and orientation allow recentering and aligning the volume associated to it, thus obtaining a representation invariant to translation, rotation and scaling. Temporal periodicity of the walking cycle is extracted to align the input data in the time domain. Finally, Hyperspherical Radon Transform is presented as an efficient tool to obtain features from spatio-temporal gait templates for classification purposes. Experimental results prove the validity and robustness of the proposed method for gait recognition tasks with several covariates.