The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
HMM-based speech synthesis system (HTS) often generates buzzy and muffled speech. Such degradation of voice quality makes synthetic speech sound robotically rather than naturally. From this point, we suppose that synthetic speech is in a different speaker space apart from the original. We propose to use voice conversion method to transform synthetic speech toward the original so as to improve its...
The focus of this work is speech synthesis tailored to the needs of spoken dialogue systems. More specifically, the framework of HMM-based speech synthesis is utilized to train an emphatic voice that also considers dialogue context for decision tree state clustering. To achieve this, we designed and recorded a speech corpus comprising system prompts from human-computer interaction, as well as additional...
HMM-based speech synthesis generally suffers from typical buzzi-ness due to over-simplified excitation modeling of voiced speech. In order to alleviate this effect, several studies have proposed various new excitation models. No consensus has however been reached on what is the perceptual importance of the accurate modeling of the periodic and aperiodic components of voiced speech, and to what extent...
The present paper describes Japanese and English singing voice synthesis systems based on hidden Markov models (HMMs). In this approach, the spectrum, excitation, and vibrato of the singing voice are simultaneously modeled by context-dependent HMMs, and waveforms are generated by the HMMs themselves. Japanese singing voice synthesis systems have already been developed and used to create variable musical...
In this paper, we propose a postfilter to compensate modulation spectrum in HMM-based speech synthesis. In order to alleviate over-smoothing effects which is a main cause of quality degradation in HMM-based speech synthesis, it is necessary to consider features that can capture over-smoothing. Global Variance (GV) is one well-known example of such a feature, and the effectiveness of parameter generation...
This paper describes a novel approach for the speaker adaptation of statistical parametric speech synthesis systems based on the interpolation of a set of average voice models (AVM). Recent results have shown that the quality/naturalness of adapted voices depends on the distance from the average voice model used for speaker adaptation. This suggests the use of several AVMs trained on carefully chosen...
In our previous work, we extend the traditional stereo-based stochastic mapping by relaxing the constraint of stereo-data, which is not practical in real applications, via HMM-based speech synthesis to construct the “clean” channel data for noisy speech recognition. In this paper, we propose to use deep neural networks (DNNs) for stereo mapping compared with the joint Gaussian mixture model (GMM)...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.