The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a method to extract structural spectral features from spectral envelopes using what-where autoencoders (WWAE) for statistical parametric speech synthesis (SPSS). A WWAE is constructed by concatenating a convolutional net for input encoding and a deconvolutional net for reconstruction. The output values of the max-pooling layer in the encoder and the positions of the max-pooling...
This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this...
In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous...
Ordering property is an important property of LSP and closely connected with the naturalness of reconstructed speech. When LSP is adopted as spectrum feature in HMM-based parametric speech synthesis, the ordering property cannot be guaranteed because diagonal covariance matrix is used in conventional system and the cross-dimension correlation of LSP vector is ignored. It will cause unstable issue...
This paper presents a minimum generation error (MGE) training method for hidden Markov model (HMM) based prediction of articulatory movements when both text and audio inputs are given. In this method, MGE criterion is adopted to replace the maximum likelihood (ML) criterion to estimate model parameters for the unified acoustic-articulatory HMMs. Different from the MGE training for HMM-based acoustic...
With the development of the HMM-based parametric speech synthesis algorithm, it is easy for impostors to generate the synthetic speech with specific speaker's characteristics, which is a serious threat to the state of the art speaker verification system. In this paper, we investigate the difference of Mel-cepstral (MCEP) between the natural and synthetic speech. Experiments demonstrate that we can...
In current hidden Markov model(HMM) based unit selection speech synthesis method, the optimal phone-sized candidate units are selected following the maximum likelihood(ML) criterion of the HMMs trained for various acoustic features. This paper introduces the statistical models for syllable-level F0 features into this method. Different from the frame-level F0 parameters used in the current framework,...
In this paper, an automatic prosodic phrase boundary labeling method for speech synthesis database is presented. This method can be divided into two stages: training stage and labeling stage. In training stage, context-dependent HMM, which is commonly adopted in the HMM-based parametric speech synthesis, is estimated using the training database with manual prosodic labeling. In labeling stage, the...
To address the overall-micro modeling issue of current prosody model in HMM-based speech synthesis, a hierarchical F0 modeling method has been proposed, in which different kinds of pittch patterns are characterized by different prosodie layers and an minimum generation error (MGE) training framework is used to simultaneous optimize F0 models of all layers. This paper investigate the importance of...
This paper presents a minimum generation error (MGE) training method using weighted Euclidean distance measure on line spectral pairs (LSP) for HMM-based speech synthesis. In this paper, weighted Euclidean distance on LSP is introduced as the measurement of generation error to improve the consistency between the model training criterion and the subjective perception on the distortion of synthetic...
In this paper, we present a novel approach to relax the constraint of stereo-data which is needed in a series of algorithms for noise-robust speech recognition. As a demonstration in SPLICE algorithm, we generate the pseudo-clean features to replace the ideal clean features from one of the stereo channels, by using HMM-based speech synthesis. Experimental results on aurora2 database show that the...
This paper proposes a state duration modeling method using full covariance matrix for HMM-based speech synthesis. In this method, a full covariance matrix instead of the conventional diagonal covariance matrix is adopted in the multi-dimensional Gaussian distribution to model the state duration of each context-dependent phoneme. At synthesis stage, the state durations are predicted using the clustered...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.