In this paper, we present a part model for human action recognition from video. We use 3D HOG descriptor and bag-of-feature to represent video. To overcome the unordered events of bag-of-feature approach, we propose a novel multiscale local part model to preserve temporal context. Our method builds upon several recent ideas including dense sampling, local spatial-temporal (ST) features, 3D HOG descriptor, BOF representation and non-linear SVMs. The preliminary results on KTH action dataset show a higher recognition rate than recent studies.