A compact formulation of turbo audio-visual speech recognition

Simon Receveur; Patrick Meyer; Tim Fingscheidt

doi:10.1109/ICASSP.2014.6854658

A compact formulation of turbo audio-visual speech recognition

Receveur, Simon, Meyer, Patrick, Fingscheidt, Tim

Source

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5517 - 5521

Abstract

Since most automatic speech recognition (ASR) systems still suffer from adverse acoustic conditions and insufficient acoustic modeling, recognition robustness can be improved by integrating further information sources such as additional acoustic channels, modalities, or models. Considering the question of information fusion, interesting parallels to problems in digital communications can be observed, where the turbo principle revolutionized reliable communication. In this paper, we provide new perspectives on turbo ASR: First, we introduce a compact formulation of turbo automatic speech recognition; second, we present a shape-based visual feature extraction algorithm without any learning paradigms. Third, we show an application to an audio-visual speech recognition task on a large data set, where our proposed method clearly outperforms the iterative approach introduced by Shivappa et al. as well as a conventional coupled-hidden-Markov-model approach by up to 23.8% relative reduction in word error rate.