Human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as medical care, human-computer interaction and virtual training. The task is challenging for feature extraction due to variations in motion performance, recording settings and inter-personal differences. To meet these challenges, we propose two types of feature extraction methods based on the Kinect depth image sequences in this paper. One is assuming that there exists even distribute position lines in the three-dimensional space of frame difference, it will be active when the moving object touches them. The other is mapping the 16 successive frame sequences to a single image by Speed Time Mapping (STM) or Time Depth Mapping (STDM), obtaining 36-dimensiona spatial-temporal features in this image. These features are fed into Support Vector Machine (SVM) to identify the action categories. The experiments compare their performance and demonstrate the effectiveness of STDM.