Continuous audio-visual digit recognition using N-best decision fusion

Georg F. Meyer; Jeffrey B. Mulligan; Sophie M. Wuerger

doi:10.1016/j.inffus.2003.07.001

Continuous audio-visual digit recognition using N-best decision fusion

Georg F. Meyer, Jeffrey B. Mulligan, Sophie M. Wuerger

Source

Information Fusion > 2004 > 5 > 2 > 91-101

Abstract

Audio-visual speech recognition systems can be categorised into systems that integrate audio-visual features before decisions are made (feature fusion) and those that integrate decisions of separate recognisers for each modality (decision fusion). Decision fusion has been applied at the level of individual analysis time frames, phone segments and for isolated word recognition but in its basic form cannot be used for continuous speech recognition because of the combinatorial explosion of possible word string hypotheses that have to be evaluated.We present a case for decision fusion at the utterance level and propose an algorithm that can be applied efficiently to continuous speech recognition tasks, which we call N-best decision fusion. The system was tested on a single-speaker, continuous digit recognition task where the audio stream was contaminated by additive multi-speaker babble noise.The audio-visual recognition system resulted in lower word error rates for all signal-to-noise conditions tested compared to the audio-alone system. The magnitude of the improvement was dependent on the signal-to-noise ratio.

Identifiers

journal ISSN :	1566-2535
DOI	10.1016/j.inffus.2003.07.001

Authors

Georg F. Meyer

Department of Psychology, Centre for Cognitive Neuroscience, University of Liverpool, Eleanor Rathbone Building, Bedford Street South, Liverpool L69 7ZA, UK

Jeffrey B. Mulligan

Human Factors Research and Technology Division, NASA Ames Research Center, MS 262-2 Moffett Field, CA, USA

Sophie M. Wuerger

Department of Psychology, Centre for Cognitive Neuroscience, University of Liverpool, Eleanor Rathbone Building, Bedford Street South, Liverpool L69 7ZA, UK

Keywords

Audio-visual speech Decision fusion Lip reading Speech recognition

Additional information

Publication languages: English

Data set: Elsevier

Publisher

Elsevier Science

Fields of science

No field of science has been suggested yet.

article

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Continuous audio-visual digit recognition using N-best decision fusion $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Georg F. Meyer

Jeffrey B. Mulligan

Sophie M. Wuerger

Keywords

Additional information

Publisher

Fields of science

Fields of science

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Continuous audio-visual digit recognition using N-best decision fusion