The RGB-Depth data has resulted in a great improvement on the task of human pose estimation, however, additional step is still necessary to interpret sequential human poses into more informative actions. In this paper, we explore extracting action patterns using temporal self-similarity from time sequential skeletons recovered from such data. For each body joint, action patterns are extracted locally in the temporal extent of a given video. Then, the standard bag-of-words framework is employed to assemble these local patterns for action modeling. Action recognition is performed using Naive-Bayes-Nearest-Neighbors classifier with also considering the spatial independence of body joints. Experimental result on the benchmarking dataset: UCF Kinect dataset, suggested the effectiveness and promise of the proposed action patterns.