The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
English is the only language available for global communication and is used by approximately 1.5 billions of speakers. It is also known to have a large diversity of pronunciation due to the influence of speakers' mother tongue, called accents. Our project aims at creating a global and individual-basis map of English pronunciations to be used in teaching and learning World Englishes (WE) as well as...
English is the only language available for global communication. Due to the influence of speakers' mother tongue, however, those from different regions inevitably have different accents in their pronunciation of English. The ultimate goal of our project is creating a global pronunciation map of World Englishes on an individual basis, for speakers to use to locate similar English pronunciations. If...
This paper points out that no existing technically-implemented speech model is adequate enough to describe one of the most fundamental and unique capacities of human speech processing. Language acquisition of infants is based on vocal imitation but they don't impersonate their parents and imitate only the linguistic and para-linguistic aspects of the parents' utterances. The vocal imitation is found...
Previous researches have indicated the relevance between segmental duration and syntax information, but the usefulness of syntax features have not been thoroughly studied for predicting segmental duration. In this paper, we design two sets of syntax features to improve Mandarin phone and pause duration prediction respectively. Instead of using manually extracted syntax information as previous researches...
HMM-based speech synthesis is known to be a possible solution for realizing "flexibility" in speech synthesis. However, its frame-by-frame process of acoustic features is not appropriate for prosodic features. Prosodic features cover a wider time span as compared to segmental features, and should be handled differently. From this point of view, a method has been developed for generating...
A method was proposed to increase the naturalness of prosody generated with speech synthesis based on hidden Markov models (HMMs). This method adds a constraint to the fundamental frequency contours (F0 contours) during the HMM-based speech synthesis. The constraint adopted is the generation process model of F0 contours (F0 model). The method first extracts the F0 model parameters from the original...
Voice conversion can be reduced to a problem to find a transformation function between the corresponding speech sequences of two speakers. Perhaps the most voice conversions methods are GMM-based statistical mapping methods. However, the classical GMM-based mapping is frame-to-frame, and cannot take account of the contextual information existing over a speech sequence. It is well known that HMM yields...
Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic...
This paper proposes hidden structure model (HSM) for statistical modeling of sequence data. The HSM generalizes our previous proposal on structural representation by introducing hidden states and probabilistic models. Compared with the previous structural representation, HSM not only can solve the problem of misalignment of events, but also can conduct structure-based decoding, which allows us to...
This paper introduces a model of mixture of probabilistic linear regressions (MPLR) to learn a mapping function between two feature spaces. The MPLR consists of weighted combination of several probabilistic linear regressions, whose parameters are estimated by using matrix calculation. The mixture nature of MPLR allows it to model nonlinear transformation. The formulation of MPLR is general and independent...
A total corpus-based process of generating prosodic features from text is developed. The process first predicts pauses and phone durations, and then generates F0 contours. Since F0 contour generation is based on the generation process model, it is rather easy to manipulate the generated F0 contours in command level. A method was developed for generating sentence F0 contours, when a focus is placed...
This paper proposes a set of affine invariant features (AIFs) for sequence data. The proposed AIFs can be calculated directly from the sequence data, and their invariance to affine transformation is proved mathematically through algebraic calculation. We apply the AIFs to speech recognition. Since the vocal tract length (VTL) difference causes to frequency warping which can be approximated well by...
This paper studies phase singularities (PSs) for image representation. We show that PSs calculated with Laguerre-Gauss filters contain important information and provide a useful tool for image analysis. PSs are invariant to image translation and rotation. We introduce several invariant features to characterize the core structures around PSs and analyze the stability of PSs to noise addition and scale...
Shadowing is a practice that requires learners to shadow a presented native utterance as closely and quickly as possible. Learners' pronunciation in shadowing, especially in the case of beginners, often becomes inarticulate and corrupt. These features of shadowing make it very difficult to assess shadowing productions. In this paper, we investigate the automatic pronunciation scoring methods for shadowing...
Mandarin speech synthesis was conducted by generating prosodic features by the proposed method and segmental features by HMM-based method. The proposed method generates sentence fundamental frequency (F0) contours by representing them as a superposition of tone components on phrase components. The tone components are realized by concatenating their fragments at tone nuclei predicted by a corpus-based...
A total corpus-based process of generating prosodic features form text is developed. The process first predicts pauses and phone durations, and then generates F0 contours. Since F0 contour generation is based on the generation process model, it is rather easy to manipulate the generated F0 contours in command level. A method was developed for generating sentence F0 contours, when a focus is placed...
Most of the speech synthesizers have been developed as text (phoneme sequence) to speech converters and, in this framework, text input is a precondition for speech production. However, we can say that no child acquires spoken language by reading a given text out. Children are explained to acquire spoken language by imitating the utterances of their parents but they never imitate the voices of their...
This research presents an innovative system for adaptive speech denoising using Independent Component Analysis (ICA) and Voice Activity Detection (VAD). Designed for instantaneous mixtures (two sources and two microphones), the proposed system identifies the noise contained in each noisy mixture. For that type of noise applies the most suitable ICA method among three methods (FastICA, Kernel ICA and...
Phase features are widely used in image processing and representation due to their stability to deformation and noise. However, phase singularities,where the signals vanish, are generally regarded as harmful and unreliable facts. In this paper, on the contrary, we will show that phase singularities calculated by Laguerre-Gauss filter contain important information of input image and can provide a reliable...
Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. The essential question here is what is the optimal segmentation. This paper formulates the optimal segmentation problem into a probabilistic framework. Using statistics...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.