Use of Missing and Unreliable Data for Audiovisual Speech Recognition

Alexander Vorwerk; Steffen Zeiler; Dorothea Kolossa; Ramón Fernandez Astudillo; Dennis Lerch

doi:10.1007/978-3-642-21317-5_13

Use of Missing and Unreliable Data for Audiovisual Speech Recognition

Alexander Vorwerk, Steffen Zeiler, Dorothea Kolossa, Ramón Fernandez Astudillo, Dennis Lerch

Source

Robust Speech Recognition of Uncertain or Missing Data > Applications: Multiple Speakers and Modalities > 345-375

Abstract

Under acoustically distorted conditions, any available video information is especially helpful for increasing recognition robustness. However, an optimal strategy for integrating audio and video information is difficult to find, since both streams may independently suffer from time-varying degrees of distortion. In this chapter, we show how missing-feature techniques for coupled HMMs can help us fuse information from both uncertain information sources.We also focus on the estimation of reliability for the video feature stream, which is obtained from a linear discriminant analysis (LDA) applied to a set of shape- and appearance-based features. The approach has resulted in significant performance improvements under strongly distorted conditions, while, in conjunction with stream weight tuning, being lowerbounded in performance by the best of the two single-stream recognizers under all tested conditions.