Pitch prediction from Mel-generalized cepstrum &#x2014; a computationally efficient pitch modeling approach for speech synthesis

M V Achuth Rao; Prasanta Kumar Ghosh

doi:10.23919/EUSIPCO.2017.8081485

Pitch prediction from Mel-generalized cepstrum — a computationally efficient pitch modeling approach for speech synthesis

Source

2017 25th European Signal Processing Conference (EUSIPCO) > 1629 - 1633

Abstract

Text-to-speech (TTS) systems are often used as part of the user interface in wearable devices. Due to limited memory and computational/battery power in wearable devices, it could be useful to have a TTS system which requires less memory and is less computationally intensive. Conventional speech synthesis systems has separate modeling for pitch (FO-model) and spectral representation, namely Mel generalized coefficients (MGC) (MGC-model). In this paper we estimate pitch from the MGC estimated using MGC-model instead of having a separate FO-model. Pitch is obtained from the estimated MGC using a statistical mapping through Gaussian mixture model (GMM). Experiments using CMU-ARCTIC database demonstrate that the proposed GMM based FO-model, even with a single mixture, results in no significant loss in the naturalness of the synthesized speech while the proposed FO-model, in addition to reducing computational complexity, results in ∼93% reduction in the number of parameters compared to that of the F0-model.