This paper presents the novel multimedia human-computer interaction system based on Kinect and depth image understanding. For human-machine interface of the user intent is one of the research directions. Industrialized society goal is the satisfaction of the material quantity and quality, which is the measure of people's living standards. It is the purpose of the man-machine interface design and let the computer more intelligent, more intelligent that can do a wider range of work. While gradually reduce, who use it can be aimed at the lack of any computer knowledge and experience of users. The user is the user of computer resources. In feature extraction, good character is capable of different categories of samples has a high degree of differentiation, and as far as possible to reduce general feature dimension and the amount of calculation. Gradient and point is visible human body recognition feature extraction in the two categories of common characteristics. With basis, we integrate the Kinect and depth image understanding paradigm for the core implementation of new multimedia human-computer interaction system. The experimental simulation proves the effectiveness and overall feasibility of the method that is meaningful.