In this paper, we present a gesture recognition approach to enable real-time manipulating projection content through detecting and recognizing speakers gestures from the depth maps captured by a depth sensor. To overcome the limited measurement accuracy of depth sensor, a robust background subtraction method is proposed for effective human body segmentation and a distance map is adopted to detect human hands. Potential Active Region (PAR) is utilized to ensure the generation of valid hand trajectory to avoid extra computational cost on the recognition of meaningless gestures and three different detection modes are designed for complexity reduction. The detected hand trajectory is temporally segmented into a series of movements, which are represented as Motion History Images. A set-based soft discriminative model is proposed to recognize gestures from these movements. The proposed approach is evaluated on our dataset and performs efficiently and robustly with 90% accuracy.