Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR

Kazuto Ukai; Satoshi Tamura; Satoru Hayamizu

doi:10.1109/ICSDA.2016.7918976

Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR

Ukai, Kazuto, Tamura, Satoshi, Hayamizu, Satoru

Source

2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) > 12 - 15

Abstract

In the speech recognition literature, building corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) is quite important. In addition, in order to overcome performance decrease caused by noise, using visual information such as lip images is effective. In this paper, therefore, we focus on collecting speech and lip-image data for audio-visual LVCSR. Audio-visual speech data were obtained from 12 speakers, each who uttered ATR503 phonetically-balanced sentences. These data were recorded in acoustically and visually clean environments. Using the data, we conducted recognition experiments. Mel Frequency Cepstral Coefficients (MFCCs) and eigenlip features were obtained, and multi-stream Hidden Markov Models (HMMs) were built. We compared the performance in clean condition to those in noisy environments. It is found that visual information is able to compensate the performance. In addition, it turns out that we should improve visual speech recognition for high-performance audio-visual LVCSR.

Identifiers

book e-ISSN :	2472-7695
book e-ISBN :	978-1-5090-3516-8
DOI	10.1109/ICSDA.2016.7918976

Authors

Ukai, Kazuto

Department of Information Science, Faculty of Engineering, Gifu University, 1-1 Yanagido, Gifu, 501-1193 Japan

Tamura, Satoshi

Department of Information Science, Faculty of Engineering, Gifu University, 1-1 Yanagido, Gifu, 501-1193 Japan

Hayamizu, Satoru

Department of Information Science, Faculty of Engineering, Gifu University, 1-1 Yanagido, Gifu, 501-1193 Japan

Keywords

audio-visual speech recognition LVCSR multi-stream HMM lipreading

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Ukai, Kazuto

Tamura, Satoshi

Hayamizu, Satoru

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR