Designing a multimodal corpus of audio-visual speech using a high-speed camera

Alexey Karpov; Andrey Ronzhin; Irina Kipyatkova

doi:10.1109/ICoSP.2012.6491539

Designing a multimodal corpus of audio-visual speech using a high-speed camera

Karpov, Alexey, Ronzhin, Andrey, Kipyatkova, Irina

Source

2012 IEEE 11th International Conference on Signal Processing > 1 > 519 - 522

Abstract

In this paper, we present a research on designing and processing an audio-visual speech database for an automatic Russian speech recognition system using Oktava MK-012 microphone and JAI Pulnix RMC-6740GE high-speed camera (200 frames per second). Developed audio-visual speech recording system is described, it provides synchronization and fusion of audio and video data recorded by the independent sensors. The system automatically detects voice activity in audio signal and stores only speech fragments discarding non-informative signals. Also it takes into account and processes natural asynchrony of both speech modalities. Methods for feature extraction of acoustic (based on Mel-frequency cepstral coefficients) and visual speech (pixel-based features of mouth region) and multimodal data temporal segmentation (by forced alignment) are presented.

Identifiers

book ISSN :	2164-5221
book e-ISSN :	2164-523X
book ISBN :	978-1-4673-2196-9
book e-ISBN :	978-1-4673-2197-6 , 978-1-4673-2195-2
DOI	10.1109/ICoSP.2012.6491539

Authors

Karpov, Alexey

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), Russia

Ronzhin, Andrey

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), Russia

Kipyatkova, Irina

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), Russia

Keywords

high-speed camera audio-visual speech multimodal system automatic speech recognition computer vision

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Designing a multimodal corpus of audio-visual speech using a high-speed camera $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Karpov, Alexey

Ronzhin, Andrey

Kipyatkova, Irina

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Designing a multimodal corpus of audio-visual speech using a high-speed camera