A dynamic Gaussian process for voice conversion

Dong-Yan Huang; Minghui Dong; Haizhou Li

doi:10.1109/ICMEW.2013.6618271

A dynamic Gaussian process for voice conversion

Dong-Yan Huang, Minghui Dong, Haizhou Li

Source

2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) > 1 - 4

Abstract

In this paper, we explore Dynamic Gaussian Processes (DGP) based learning techniques for voice conversion. In particular, we propose to use dynamic squared exponential GP with sparse partial least squares (SPLS) technique to model nonlinearities as well as to capture the dynamics in the source data. The concatenation of previous and next frames can well model dynamics. Sparse partial least squares regression is used to find a mapping function in order to overcome the problem of overfitting. The proposed dynamic GP-based learning technique features simple, efficient and high accuracy without massive tuning. The experimental results show that the proposed approach for voice conversion is able to produce good similarity between the original and the converted target voices and achieves a great improvement in the sound quality compared to the state-of-the-art Gaussian mixture-based model.