In this paper we study two techniques for combining word-level features for emotion prediction. Prior research has primarily focused on the use of turn-level features as predictors. Recently, the utility of word-level features has been highlighted but only tested on relatively small human- computer corpora. We extend over previous work by investigating the strengths and weaknesses of two different techniques for using word-level features and by using a larger corpus of human-computer dialogue. Our results confirm that the word-level pitch features fare better than the turn-level ones regardless of the combination technique. In addition, we find that each word combination technique has different strengths and weaknesses in terms of precision and recall.