The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In current hidden Markov model(HMM) based unit selection speech synthesis method, the optimal phone-sized candidate units are selected following the maximum likelihood(ML) criterion of the HMMs trained for various acoustic features. This paper introduces the statistical models for syllable-level F0 features into this method. Different from the frame-level F0 parameters used in the current framework,...
In this paper, we propose a Gaussian mixture model (GMM) based voice conversion method using explicit feature transform models. A piecewise linear transform with stochastic bias is adopted to present the relationship between the spectral features of source and target speakers. This explicit transformations are integrated into the training of GMM for the joint probability density of source and target...
In this paper, an automatic prosodic phrase boundary labeling method for speech synthesis database is presented. This method can be divided into two stages: training stage and labeling stage. In training stage, context-dependent HMM, which is commonly adopted in the HMM-based parametric speech synthesis, is estimated using the training database with manual prosodic labeling. In labeling stage, the...
To address the overall-micro modeling issue of current prosody model in HMM-based speech synthesis, a hierarchical F0 modeling method has been proposed, in which different kinds of pittch patterns are characterized by different prosodie layers and an minimum generation error (MGE) training framework is used to simultaneous optimize F0 models of all layers. This paper investigate the importance of...
This paper presents a minimum generation error (MGE) training method using weighted Euclidean distance measure on line spectral pairs (LSP) for HMM-based speech synthesis. In this paper, weighted Euclidean distance on LSP is introduced as the measurement of generation error to improve the consistency between the model training criterion and the subjective perception on the distortion of synthetic...
In this paper, we present a novel approach to relax the constraint of stereo-data which is needed in a series of algorithms for noise-robust speech recognition. As a demonstration in SPLICE algorithm, we generate the pseudo-clean features to replace the ideal clean features from one of the stereo channels, by using HMM-based speech synthesis. Experimental results on aurora2 database show that the...
This paper proposes a state duration modeling method using full covariance matrix for HMM-based speech synthesis. In this method, a full covariance matrix instead of the conventional diagonal covariance matrix is adopted in the multi-dimensional Gaussian distribution to model the state duration of each context-dependent phoneme. At synthesis stage, the state durations are predicted using the clustered...
This paper proposes a two-layer fundamental frequency (FO) modeling method for HMM-based parametric speech synthesis. The FO models are trained for each context- dependent phoneme in the conventional HMM-based speech synthesis system. Considering the super-segmental characteristics of FO features, an explicit syllable-layer FO model is introduced in this paper. At synthesis stage, the FO contour is...
Pruning redundant synthesis instances or tailoring TTS voice font is an important issue of Corpus-based TTS. But pruning redundant synthesis instances, usually results in loss of non-uniform. In order to solve this problem, this paper proposes the concept of virtual non-uniform. According to this concept and the synthesis frequency of each instance, the algorithm named StaRp-VPA is constructed as...
Because of diversity of hardware environments, building scalable text-to-speech system is an important issue of Corpus-based text-to-speech system. This paper proposes and analyses three semantic computing problems of building scalable text to speech system: similarity calculation, granular computing and automated instances-pruning process framework. According to these, an acoustic clustering algorithm-NuClustering-VPA...
Due to the inconsistency between the maximum likelihood (ML) based training and the synthesis application in HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We propose a MGE linear regression (MGELR) based model adaptation algorithm, where the...
Due to the inconsistency between the maximum likelihood (ML) based training and the synthesis application in HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We propose a MGE linear regression (MGELR) based model adaptation algorithm, where the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.