Head-mounted displays (HMDs) have gained more and more interest recently. They can enable people to communicate with each other from anywhere, at anytime. However, since most HMDs today are only equipped with cameras pointing outwards, the remote party would not be able to see the user wearing the HMD. In this paper, we present a system for facial expression tracking based on head-mounted, inward looking cameras, such that the user can be represented with animated avatars at the remote party. The main challenge is that the cameras can only observe partial faces since they are very close to the face. We experiment with multiple machine learning algorithms to estimate facial expression parameters based on training data collected with the assistance of a Kinect depth sensor. Our results show that we can reliably track people's facial expression even from very limited view angles of the cameras.