The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in Mandarin conversational speech. Observing the KWS corpus, we found there are many words with similar pronunciation with predefined keywords, although they have different Chinese characters and different meanings, which easily result in high false alarm rate. In this paper, an improved...
This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese-English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation conversion method based on syllable-based characteristic...
The performance of HMM-based text to speech (TTS) system is affected by the basic modeling units and the size of training data. This paper compares two HMM based Mandarin TTS systems using syllable and phone as basic units respectively with 1000, 3000 and 5000 sentences' training data. Two female speakers' corpora are used as training data for evaluation. For both corpora, the system using syllable...
This paper gives an up-to-date description of the IBM Mandarin broadcast transcription system developed under the DARPA GALE program. Technical advances over our previous system include a novel acoustic modeling approach using subspace Gaussian mixture models, a speaking rate adaptation method using frame rate normalization, and an effective recipe for lattice combination. We present results on three...
The prosodic phrasing is a classic problem in nature language process, which is not only useful for text-to-speech(TTS), but for speech recognition, statistic machine learning etc.. This paper introduces and discusses the source-channel model for Chinese prosodic phrasing. Based on the basic idea, the hidden Markov model (HMM) and the improved source-channel model are both used to describe the phrasing...
In this paper, we explore an approach to improved confidence measures based on a novel alignment confusion rate (ACR) which integrates alignment information from two different modeling unit sets in Chinese digits recognition system. Both initial-final (IF) phone set and head-body-tail (HBT) models have proven to obtain good recognition performance for connected digit strings. These two different modeling...
The tone is a distinctive discriminative feature in Mandarin Chinese. Often functional, yet seldom thorough are most large-scale Mandarin speech recognition systems in treating tone modeling. In particular, many lack the necessary sophistication to deal with the myriad variations arising from the combination of acoustic and lexical contexts. This paper reports an attempt to account for these variabilities...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.