The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese-English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation conversion method based on syllable-based characteristic...
The prosodic phrasing is a classic problem in nature language process, which is not only useful for text-to-speech(TTS), but for speech recognition, statistic machine learning etc.. This paper introduces and discusses the source-channel model for Chinese prosodic phrasing. Based on the basic idea, the hidden Markov model (HMM) and the improved source-channel model are both used to describe the phrasing...
This paper describes the system and algorithmic developments in the automatic transcription of Mandarin broadcast speech made at IBM in the second year of the DARPA GALE program. Technical advances over our previous system include improved acoustic models using embedded tone modeling, and a new topic-adaptive language model (LM) rescoring technique based on dynamically generated LMs. We present results...
In this paper, we propose a novel voice conversion method by combining frequency warping and unit selection to improve the similarity to target speaker. We use frequency warping to get the warped source spectrum, which will be used as estimated target for later unit selection of the target speaker's spectrum. Such estimated target can preserve the natural transition of human's speech. Then, part of...
This paper describes the technical and system building advances in the automatic transcription of Mandarin broadcast speech made at IBM in the first year of the DARPA GALE program. In particular, we discuss the application of minimum phone error (MPE) discriminative training and a new topic-adaptive language modeling technique. We present results on both the RT04 evaluation data and two larger community-defined...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.