Hierarchical prosody modeling of English speech and its application to TTS

Chung-Yao Tsai; Chin-Kuan Kuo; Yih-Ru Wang; Sin-Horng Chen; I-Bin Liao; Chen-Yu Chiang

doi:10.1109/ICSDA.2014.7051427

Hierarchical prosody modeling of English speech and its application to TTS

Chung-Yao Tsai, Chin-Kuan Kuo, Wang, Yih-Ru, Chen, Sin-Horng, I-Bin Liao, Chiang, Chen-Yu

Źródło

2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) > 1 - 6

Abstrakt

In this paper, a hierarchical prosody modeling approach for English speech is proposed. It is an extended version of the HPM approach proposed previously for Mandarin speech. It first designs a syllable-based, statistical prosodic model to describe various relationships of prosodic-acoustic features of the speech signal, linguistic features of the associated text, and prosodic tags representing the underlining prosody structure of the speech. It then employs a prosody labeling and modeling algorithm to estimate the model parameters and label the prosodic tags of all training utterances simultaneously from a prosody-unlabeled speech corpus. Experimental results on a corpus containing many paragraphic utterances of a female English-majored Chinese speaker show that the inferred parameters of the model are all meaningful. We then use the trained model to generate prosodic information for a TTS system. An informal listening test shows that the synthetic speech sounds quite natural.