Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy

Chun Keung Chau; Chak Shun Lai; Bertram Emil Shi

doi:10.1007/3-540-45336-9_17

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy

Chun Keung Chau, Chak Shun Lai, Bertram Emil Shi

Source

Lecture Notes in Computer Science > Active Media Technology > Active Interfaces > 134-143

Abstract

We describe an architecture for speech recognition based interactive toys and discuss the strategies we have adopted to deal with the requirements for the speech recognizer imposed by this application. In particular, we focus on the fact that speech recognizers used in interactive toys must deal with users whose age ranges from children to adults. The large variations in vocal tract length between children and adults can significantly degrade the performance of speech recognizers. We compare two approaches to vocal tract length normalization: feature-based VTLN and model-based VTLN. We describe why intuitively, one might expect that due to the coarser frequency information used by the model-based approach, that feature-based VTLN would outperform modelbased VTLN. However, our results indicate that there is very little difference in performance between the two schemes.

Identifiers

series ISSN :	0302-9743
series e-ISSN :	1611-3349
book ISBN :	978-3-540-43035-3
book e-ISBN :	978-3-540-45336-9
DOI	10.1007/3-540-45336-9_17

Authors

Chun Keung Chau

Hong Kong University of Science and Technology, Consumer Media Center/Human Language Technology Center Department of Electrical and Electronic Engineering, Kowloon, Hong Kong

Chak Shun Lai

Hong Kong University of Science and Technology, Consumer Media Center/Human Language Technology Center Department of Electrical and Electronic Engineering, Kowloon, Hong Kong

Bertram Emil Shi

Hong Kong University of Science and Technology, Consumer Media Center/Human Language Technology Center Department of Electrical and Electronic Engineering, Kowloon, Hong Kong

Additional information

Data set: Springer

Publisher

Springer Berlin Heidelberg

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Chun Keung Chau

Chak Shun Lai

Bertram Emil Shi

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy