We propose a graph-based approach for semi-automatic tracking of the human tongue in 2D+time ultrasound image sequences. We construct a graph capturing the intra- (spatial) and inter-frame (temporal) relationships between the dynamic contour vertices. Tongue contour tracking is formulated as a graph-labeling problem, where each vertex is labeled with a displacement vector describing its motion. The optimal displacement labels are those minimizing a multi-label Markov random field energy with unary, pairwise, and ternary potentials, capturing image evidence and temporal and smoothness regularization, respectively. The regularization strength is designed to adapt to the reliability of images features. Evaluation based on real clinical data and comparative analyses with existing approaches demonstrate the accuracy and robustness of our method.