The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper describes the latest version of handheld speech-to-speech translation system developed by National Institute of Information and Communications Technology, NICT. As the entire speech-to-speech translation functions are implemented into one terminal, it realizes real-time and location free speech-to-speech translation service for many language pairs. A new noise-suppression technique notably...
This paper outlines the National Institute of Information and Communications Technology / Advanced Telecommunications Research Institute International (NICT/ATR) research activities in developing a spoken language translation system, specially for translating Indonesian spoken utterances into/from Japanese or English. Since the NICT/ATR Japanese-English speech translation system is an established...
This paper describes an approach to the realization of a Vietnamese speech synthesis system applying a technique whereby speech is directly synthesized from Hidden Markov models (HMMs). Spectrum, pitch, and phone duration are simultaneously modeled in HMMs and their parameter distributions are clustered independently by using decision tree-based context clustering algorithms. Several contextual factors...
In this paper, we introduce Japanese segmental duration characteristics and computational modeling that we have been studying for around three decades in speech synthesis. A series of experimental results are also shown on loudness dependence in the duration perception. These computational duration modeling and perceptual studies on duration error sensitivity to loudness give some insights for computational...
In this paper, we present the derivation of the backfitting training algorithms for generic p-layer additive F0 models for arbitrary positive integer p. We have presented the special cases of the algorithms with p = 2 and p = 3 that have been successfully applied to the modelings of Japanese and English F0 contours, whereas the derivation of the algorithm was presented only for the two-layer case...
We propose an approach to modeling Chinese tonal patterns, focusing on the basic fundamental frequency (F0) patterns characterized by the contextual linguistic features that can be directly extracted from text. We analyze tonal patterns as sparse target points (tonal F0 peaks and valleys) and represent them in parametric form within the framework of a functional F0 model. The relationships between...
Chinese is a tonal language. It has both lexical tones and intonation. The fundamental frequency (F0) contours thereby consist of tone and intonation components. This paper presents an approach to modeling the two components in separate ways and combining them to form the final F0 contours based on a functional F0 model. We analyze tonal patterns as sparse target points (tonal F0 peaks and valleys)...
A new integrated model for simultaneous modeling of linguistic and acoustic models, and a training algorithm is proposed. Usually, text-to-speech (TTS) systems based on the hidden Markov model (HMM) consist of text analysis and speech synthesis modules. Linguistic and acoustic model training are performed independently using different training data sets. Integrated model parameters were simultaneously...
Modulation of speaking tone in frequency can make speech interesting and convey subtle meaning in communication. We present a frequency modulation (FM) technique for prosodic modification to consider communicative speech synthesis. This technique provides a mathematical formulation for representing speaking tone and manipulating FM in a unified framework. Two experiments are conducted with a text-to-speech...
One of the issues of speech synthesizers based on hidden Markov models concerns the vocoded quality of the synthesized speech. From the principle of analysis-by-synthesis speech coders a trainable excitation model has been proposed to improve naturalness, where the method consists in the design of a set of state-dependent filters in a way to minimize the distortion between residual and synthetic excitation...
Corpus-based concatenative speech synthesis is very popular these days due to its highly natural speech quality. The amount of computation required in the run time, however, is often quite large and various approaches have been proposed for reducing this runtime computation. In this paper, we propose early stopping schemes for Viterbi beam search in the unit selection, with which we can stop early...
The prosodic contributions to voice fundamental frequency (F0) contours can be analyzed into a series of sparser tonal targets (F0 peaks and valleys). The transitions through these targets are interpolated by spline or filtering functions to predict the shape of F0 contours. A functional model was proposed in the previous work for this purpose. This paper presents an enhanced version of this model...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.