The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a unit-selection and waveform concatenation speech synthesis system based on synthetic speech naturalness evaluation. A Support Vector Machine (SVM) and Log Likelihood Ratio (LLR) based synthetic speech naturalness evaluation system was introduced in our previous work. In this paper, the evaluation system is improved in three aspects. Finally, a unit-selection and concatenation...
A recent common approach to monitor and adapt system behavior at runtime is to decouple one or more external modules and self-adaptive mechanisms from the target system. The non-invasive manners have the main advantage of realizing separation of concerns. However, some uncertainty aspects emerge while utilizing these separate control units. The unanticipated inherence and complexity of upcoming services...
Accidental gas leaks from unknown sites will cause the serious environmental pollution. One of the efficient methods to solve the problem is tracking and locating the plume source position. This paper presents a wireless sensor network installed with the gas sensor to on-line monitor the environment and estimate the location of a gas source based on the concentration readings at the wireless sensor...
In this paper, we present a novel approach to relax the constraint of stereo-data which is needed in a series of algorithms for noise-robust speech recognition. As a demonstration in SPLICE algorithm, we generate the pseudo-clean features to replace the ideal clean features from one of the stereo channels, by using HMM-based speech synthesis. Experimental results on aurora2 database show that the...
This paper proposes a state duration modeling method using full covariance matrix for HMM-based speech synthesis. In this method, a full covariance matrix instead of the conventional diagonal covariance matrix is adopted in the multi-dimensional Gaussian distribution to model the state duration of each context-dependent phoneme. At synthesis stage, the state durations are predicted using the clustered...
This paper presents an investigation into ways of integrating articulatory features into hidden Markov model (HMM)-based parametric speech synthesis. In broad terms, this may be achieved by estimating the joint distribution of acoustic and articulatory features during training. This may in turn be used in conjunction with a maximum-likelihood criterion to produce acoustic synthesis parameters for...
Semi-tied covariance (STC) is applied widely in speech recognition due to its feature de-correlation ability. Solving the transform matrices of STC is a nonlinear optimization problem. Gales proposed an efficient method by iteratively updating a row of transform matrices. However, it needs to solve cofactors of elements of a matrix row in two layers of loops. Directly solving them is very time-consuming...
Posterior probability is mostly used for pronunciation evaluation. This paper introduces pronunciation space models to calculate posterior probability replacing traditional phone-based acoustic models, which makes the calculated posterior probability more precise. Pronunciation space models are constructed using unsupervised clustering method guided by human scores and phone-level posterior probability...
This paper presents a method that the dependency between F0 and spectral features are modeled for the HMM-based parametric speech synthesis system. In conventional systems these two features are modeled as two independent streams, which is inconsistent with the fact that there always exists interaction between the extracted F0 and spectral parameters for model training. A piecewise linear transform...
In this paper appropriate confidence measures (CMs) are investigated for Mandarin command word recognition, both in the so-called target region and non-target region, respectively. Here the target region refers to the recognized speech part of command word while the non-target region refers to the recognized silence part. It shows that exploiting extra information in the non-target region can effectively...
Tonal evaluation of Chinese continuous speech plays an important role in Mandarin Chinese pronunciation test. In this paper, we introduce the Multi- Space Distribution Hidden Markov Model based on prosodic word. The results show that the performance of tonal syllable error rate can be reduced. For the non-standard Chinese Mandarin speech, the correlation between computer score and expert score was...
In order to solve the issues related to the maximum likelihood (ML) based HMM training for HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We introduce a MGE linear regression (MGELR) based model adaptation algorithm, where the transforms from source HMMs to...
Due to the inconsistency between the maximum likelihood (ML) based training and the synthesis application in HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We propose a MGE linear regression (MGELR) based model adaptation algorithm, where the...
This paper presents a novel discriminative training criterion, minimum word classification error (MWCE). By localizing conventional string-level MCE loss function to word-level, a more direct measure of empirical word classification error is approximated and minimized. Because the word-level criterion better matches performance evaluation criteria such as WER, an improved word recognition performance...
This paper presents a minimum unit selection error (MUSE) training method for HMM-based unit selection speech synthesis system, which selects the optimal phone-sized unit sequence from the speech database by maximizing the combined likelihood of a group of trained HMMs. Under MUSE criterion, the weights and distribution parameters of these HMMs are estimated to minimize the number of different units...
Recently, we proposed a novel optimization algorithm called constrained line search (CLS) to train Gaussian mean vectors of HMMs in the MMI sense. In this paper, we extend and re-formulate it in a more general framework. The new CLS can optimize any discriminative objective functions including MMI, MCE, MPE/MWE etc. Also, closed-form solutions to update all Gaussian mixture parameters, including means,...
To reduce the overload of human management, recently runtime self-adaptation is emerging as an important characteristic required by most intelligent software-intensive systems. Most methods are built upon the analysis of concepts of architecture and exploit some "craft" from the perspective of qualitative analysis. However, these methods are often incapable of reasoning about the history...
This paper presents a word graph based feature enhancement method for robust speech recognition in noise. The approach uses signal processing based speech enhancement as a starting point, and then performs Wiener filtering to remove residual noise. During the process, a decoded word graph is used to directly guide the feature enhancement with respect to the HMM for recognition, so that the enhanced...
In this paper, we propose a novel constrained line search to optimize the MMEE objective function for training discriminative HMMs. In our method, the MMI estimation is cast as a constrained maximization problem, where Kullback-Leibler divergence between models before and after parameters adjustment is introduced as a constraint during optimization. Then, based on the idea of line search, we show...
We propose to use minimum divergence, where acoustic similarity between HMMs is characterized by Kullback-Leibler divergence, for discriminative training. The MD objective function is defined as a posterior weighted divergence measured over the whole training set. Different from our earlier work, where KLD-based acoustic similarity is pre-computed for all initial models and stays invariant in the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.