The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In recent years, the unit selection based concatenative speech synthesis system that uses large speech database has become popular because it can produce high quality synthesized speech. However, using such a large speech database is not practical for many applications such as those ported on embedded devices with the storage requirement and the computational complexity involved in searching it. In...
In text-independent speaker verification, unsupervised mode can improve system performance. In traditional systems, the speaker model is updated when a test speech has a score higher than a particular threshold; we call this unsupervised model training. In this paper, an unsupervised score normalization is proposed. A target speaker score Gauss and an impostor score Gauss are set up as a prior; the...
This paper reports a detailed study on minimum phone error (MPE), minimum phone frame error (MPFE), and a physical-state level version of minimum Bayes risk (sMBR) training, as well as several modified versions of them, for transcription of large vocabulary Mandarin broadcast news. We found the results are quite different from these observed previously for English and Arabic broadcast news tasks[l],...
In this paper we present a conversational dialogue system, CityBrowser II, which allows users to inquire about information about restaurants in Mandarin. Developed in the Galaxy infrastructure with a common, language-independent semantic representation, CityBrowser integrates portability and scalability. By inheriting the infrastructure and main language understanding/generation components from its...
In this paper, we present a novel pseudo-key analysis approach for the fusion system of language recognition. The state-of-the-art language recognition systems for the NIST language recognition evaluation (LRE) commonly consist of multiple language classifiers. To avoid the fusion system to be spoiled by one abnormal classifier, pseudo keys are designed to check the integrity of each of the individual...
The GMM-UBM system is the current state-of-the-art approach for text-independent speaker verification. The advantage of the approach is that both target speaker model and impostor model (UBM) have generalization ability to handle "unseen" acoustic patterns. However, since GMM-UBM uses a common anti-model, namely UBM, for all target speakers, it tends to be weak in rejecting impostors' voices...
In this paper, a new feature selection method for speaker recognition is proposed to keep the high quality speech frames for speaker modelling and to remove noisy and corrupted speech frames. In order to obtain robust voice activity detection in variety of acoustic conditions, the spectral subtraction algorithm is adopted to estimate the frame power. An energy based frame selection algorithm is then...
This paper proposes to perform latent semantic analysis (LSA) on character/syllable n-gram sequences of automatic speech recognition (ASR) transcripts, namely subword LSA, as an extension of our previous work on subword text tiling for automatic story segmentation of Chinese broadcast news. LSA represents the 'meaning' of a lexical term by a feature vector conveying the term's relations with other...
The negative effect of the session variability has become more and more severe for the performance of the speaker verification system. This paper discusses the eigenchannel compensation and investigates the symmetric scoring method to diminish the session variability and further enhance the performance. Experiments were conducted on the core tests of the 2006 and 2008 speaker recognition evaluation...
In this paper, a synchronous method based on state graph is proposed to calculate the evaluation feature for automatic scoring in computer-assisted language learning (CALL). The posterior probabilities of states are selected as the main feature. The score of hypothesized phonemes and words are estimated using the information of corresponding states. Traditional systems use two passes and two different...
In speech enhancement literature, the signal subspace based method gains a lot of attention because of its simplicity in analytical formulations. The original idea in this method is based on the assumption that clean speech signal occupies a certain low dimensional space, while the noise signal which is a white additive noise spread the whole observation space. In this method, accurate estimation...
The present study investigates the intonational pattern of imperative sentence, especially those having intensive mood, such as ordering and forbidding in Standard Chinese. Grouping the sentences by length and focusing on the fundamental frequency, this paper tries to provide a description of pitch patterns of Chinese strong imperatives. Comparing to the declarative sentence, the pitch contour of...
Word alignment plays a critical role in statistical machine translation (SMT) and cross-language information retrieval. Until now, most existing methods get the word alignment within the whole range of the sentence length. The alignment quality is unsatisfactory. In this paper, we propose a novel approach to word alignment based on multi-grain model (WAMG). We split a parallel sentence pair into blocks...
Semi-tied covariance (STC) is applied widely in speech recognition due to its feature de-correlation ability. Solving the transform matrices of STC is a nonlinear optimization problem. Gales proposed an efficient method by iteratively updating a row of transform matrices. However, it needs to solve cofactors of elements of a matrix row in two layers of loops. Directly solving them is very time-consuming...
We propose an acoustic feature for speech recognition based on the combination of MFCC and fractional Fourier transform (FrFT). The transform orders for FrFT are adaptively set according to the intraframe pitch change rate. This method is motivated by the fact that the speech is not stationary even in a short period of time, and the idea is shown using an AM-FM speech model and some spectrograms of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.