In this paper we present our on-going data collection of multi-modal real-world driving. Video, speech, driving behavior, and physiological signals from 150 drivers have already been collected. To provide a more meaningful description of the collected data, we propose a transcription protocol based on six major groups: driver mental state, driver actions, driverpsilas secondary task, driving environment, vehicle status, and speech/background noise. Data from 30 drivers are transcribed. We then show how transcription reliability can be improved by properly training annotators. Finally, we integrate transcriptions, driving behavior, and physiological signals using a Bayesian network for estimating a driverpsilas level of irritation. Estimations are compared to actual values, assessed by the drivers themselves. Preliminary results are very encouraging.