We introduce a vision based, marker less upper body pose tracking approach that first tracks the 3D movements of extremities, including head and hands. Then based on the knowledge of upper body model, these extremity movements are used to predict the whole upper body motion as an inverse kinematics problem. The experimental validation showed the promise of applying this approach in several smart environments and HCI situations, e.g. user activity observation in driving scene, meeting room, teleconference scene.