Key frames extraction play an important role in video abstraction. Traditional key frame extraction methods only use color, texture, or shape features to represent a frame, while the motion feature is ignored or inappropriately modeled. Since the motion feature contains a lot of semantic information in video analysis, we propose a compact representation of the dominant motion information for each frame, based on a mean shift analysis procedure. Then, an EMD (Earth mover's distance) is employed as a similarity metric for the represented motion feature. Moreover, we propose a novel temporal k-means clustering algorithm for the key frame extraction, which naturally incorporates the sequential constraint into extracted key frames. Experimental results demonstrate the effectiveness of our approach.