The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes an improved cross-lingual speaker adaptation technique with considering the differences between language-dependent average voices in a Speech-to-Speech Translation system. A state mapping based method had been introduced for cross-lingual speaker adaptation in HMM-based speech synthesis. In this method, the transforms estimated from the input language are applied to average voice...
This paper describes factor analyzed voice models for realizing various voice characteristics in the HMM-based speech synthesis. The eigenvoice method can synthesize speech with arbitrary voice characteristics by interpolating representative HMM sets. However, the objective of PCA is to accurately reconstruct each speaker-dependent HMM set, and this is not equivalent to estimating models which represent...
This paper proposes a new framework of speech synthesis based on the Bayesian approach. The Bayesian method is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters. In the proposed framework, all processes for constructing the system can be derived from one single predictive distribution which represents the basic problem of speech synthesis directly...
This paper proposes a simultaneous modeling of spectrum and F0 for voice conversion based on MSD (multi-space probability distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian mixture model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F0 sequences are usually converted by a simple linear function...
This paper proposes a novel stereo-based stochastic noise compensation technique based on trajectory GMMs. Although the GMM-based noise compensation techniques such as SPLICE work effective, their performance sometimes degrades due to the inappropriate dynamic characteristics caused by the frame-by-frame mapping. While the use of dynamic feature constraints on the mapping stage can alleviate this...
A new integrated model for simultaneous modeling of linguistic and acoustic models, and a training algorithm is proposed. Usually, text-to-speech (TTS) systems based on the hidden Markov model (HMM) consist of text analysis and speech synthesis modules. Linguistic and acoustic model training are performed independently using different training data sets. Integrated model parameters were simultaneously...
In conventional HMM-based speech synthesis framework, spectral features are modeled in one stream, and stream-dependent tree-based clustering was then applied for tying the model parameters. In this paper, we investigate several different stream-dependent tying structures for spectral features by splitting the feature vector into several streams. One splitting approach is to split each feature dimension...
This paper proposes an acoustic modeling technique based on an additive structure of context dependencies for HMM-based speech recognition. Typical context dependent models, e.g., triphone HMMs, have direct dependencies of phonetic contexts, i.e., if a phonetic context is given, the Gaussian distribution is specified immediately. This paper assumes a more complex structure, an additive structure of...
In this paper, we propose separable lattice hidden Markov models, in which multiple hidden state sequences interact to model the observation on a lattice. The proposed model can be efficiently applied for modeling images, image sequences, 3-D object models and higher dimensional applications, due to the composite structure of Markov chains which reduces the complexity while retaining good properties...
In hidden Markov models (HMMs), state duration probabilities decrease exponentially with time. It would be an inappropriate representation of temporal structure of speech. One of the solutions for this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). Although a number of attempts to use explicit duration...
In the present paper, the Monte Carlo EM (MCEM) algorithm with a Gibbs sampler is applied for estimating parameters of a trajectory HMM, which has been derived from an HMM by imposing explicit relationships between static and dynamic features. The trajectory HMM can alleviate two limitations of the HMM, which are i) constant statistics within a state, and ii) conditional independence of state output...
This paper describes a method for determining the vocal tract spectrum from articulatory movements using an hidden Markov models (HMMs). In the proposed system, articulatory parameters are generated from a TTS system and converted to acoustic features to be synthesized. Comparing with conventional GMM-based systems, the proposed system has two additional properties: 1) phonetic information given input...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.