The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a product-of-expert framework to perform probabilistic acoustic mapping for cross-lingual phone recognition. Under this framework, the posterior probabilities of the target HMM states are modelled as the weighted product of experts, where the experts or their weights are modelled as functions of the posterior probabilities of the source HMM states generated by a foreign phone recogniser...
In this paper, we investigate some specific acoustic problems of the computer assisted language learning (CALL) system by modifying the acoustic model and feature under the speech recognition framework. At first, in order to alleviate the distortion of channel and speaker, speaker-dependent Cepstrum Mean Normalization (Speaker CMN) is adopted, by which the average correlation coefficient (ACC) between...
Short vowels in Arabic are normally omitted in written text which leads to ambiguity in the pronunciation. This is even more pronounced for dialectal Arabic where a single word can be pronounced quite differently based on the speaker's nationality, level of education, social class and religion. In this paper we focus on pronunciation modeling for Iraqi-Arabic speech. We introduce multiple pronunciations...
This paper takes phonetic information into account for data alignment in text-independent voice conversion. Hidden Markov models are used for representing the phonetic structure of training speech. States belonging to same phoneme are grouped together to form a phoneme cluster. A state mapped codebook based transformation is established using information on the corresponding phoneme clusters from...
In this issue, "Best of the Web" presents online resources available to tackle the problem of speech recognition. Automatic speech recognition turns spoken audio into a sequence of words. It is an extraordinarily broad and multidisciplinary field, drawing primarily from statistical signal processing, machine learning, and linguistics. Although speech-recognition-oriented applications are...
Application of linguistic knowledge of language transfer to automatic speech recognition (ASR) technology can enhance mispronunciation detection performance in computer-aided pronunciation training (CAPT). This is achieved by pinpointing salient pronunciation errors made by second language learners. In this work, we propose to apply decision fusion for further improvement in mispronunciation detection...
Acoustic confusions degrade the accuracy of pronunciation assessment severely in computer assisted language learning (CALL) systems. This paper presents our recent study on effective modeling of the acoustic confusions. We change the traditional Mandarin syllable structure, which is composed of initial and final, to a novel phoneme structure. Several phoneme splitting strategies are investigated,...
Speech understanding requires the ability to parse spoken utterances into words. But this ability is not innate and needs to be developed by infants within the first years of their life. So far almost all computational speech processing systems neglected this bootstrapping process. Here we propose a model for early infant word learning embedded into a layered architecture comprising phone, phonotactics...
This paper presents a method using speech recognition with linguistic constraints to detect the mispronunciations made by Cantonese learners of English. The predicted pronunciation errors have been derived from cross-language phonological comparisons, which are used to generate the erroneous pronunciation variations in a lexicon. The acoustic models are trained with native speakerspsila speech and...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.