This paper presents multi-modal analysis of human-computer interactions based on automatic inference of expressions in speech. It describes an automatic inference system that recognizes aural expressions of emotions, complex mental states and expression mixtures. The implementation is based on the observation that different vocal features distinguish different expressions. The system was trained on an English database (MindReading), and then was applied to a Hebrew multi-modal database of naturally evoked expressions (Doors). This paper describes the statistical and dynamic analysis of sustained interactions from the Doors database. The analysis is based on the correlation between the inferred expressions with events, physiological cues such as galvanic skin response and behavioural cues. The presented analysis indicates that the vocal expression of complex mental states such as thinking, certainty and interest are not necessarily unique to one language and culture. The system provides an analysis tool for sustained human computer interactions.