An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition

Takami Yoshida; Kazuhiro Nakadai; Hiroshi G. Okuno

doi:10.1007/978-3-642-13022-9_6

An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition

Takami Yoshida, Kazuhiro Nakadai, Hiroshi G. Okuno

Source

Lecture Notes in Computer Science > Trends in Applied Intelligent Systems > Application to Robotics > 51-61

Abstract

Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

Identifiers

series ISSN :	0302-9743
series e-ISSN :	1611-3349
book ISBN :	978-3-642-13021-2
book e-ISBN :	978-3-642-13022-9
DOI	10.1007/978-3-642-13022-9_6

Authors

Takami Yoshida

Tokyo Institute of Technology, Graduate School of Information Science and Engineering, Tokyo, Japan

Kazuhiro Nakadai

Tokyo Institute of Technology, Graduate School of Information Science and Engineering, Tokyo, Japan
Honda Research Institute Japan, Co., Ltd., Saitama, Japan

Hiroshi G. Okuno

Kyoto University, Graduate School of Informatics, Kyoto, Japan

Keywords

Audio-Visual integration Voice Activity Detection Speech Recognition

Additional information

Data set: Springer

Publisher

Springer Berlin Heidelberg

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Takami Yoshida

Kazuhiro Nakadai

Hiroshi G. Okuno

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition