The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight...
In this work we concentrate on generating compound words with high order n-gram information for speech recognition. In most existing compound words generation methods, only bi-gram information is considered. They are successful for improving the performance of bi-gram models but doesn't work well in higher order n-gram cases. Since nowadays 3-gram and 4-gram language models are commonly used, here...
This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in Mandarin conversational speech. Observing the KWS corpus, we found there are many words with similar pronunciation with predefined keywords, although they have different Chinese characters and different meanings, which easily result in high false alarm rate. In this paper, an improved...
This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese-English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation conversion method based on syllable-based characteristic...
Pronunciation variation is a natural and inevitable phenomenon in an accented Mandarin speech recognition application. In this paper, we integrate knowledge-based and data-driven approaches together for syllable-based pronunciation variation modeling to improve the performance of Mandarin speech recognition system for speakers with Southern accent. First, we generate the syllable-based pronunciation...
This paper gives an up-to-date description of the IBM Mandarin broadcast transcription system developed under the DARPA GALE program. Technical advances over our previous system include a novel acoustic modeling approach using subspace Gaussian mixture models, a speaking rate adaptation method using frame rate normalization, and an effective recipe for lattice combination. We present results on three...
The prosodic phrasing is a classic problem in nature language process, which is not only useful for text-to-speech(TTS), but for speech recognition, statistic machine learning etc.. This paper introduces and discusses the source-channel model for Chinese prosodic phrasing. Based on the basic idea, the hidden Markov model (HMM) and the improved source-channel model are both used to describe the phrasing...
The Query-by-Humming (QBH) system allows users to retrieve songs by singing/humming. In this paper we propose a phrase-level piecewise linear scaling algorithm for melody match. Musical phrase boundaries are predicted for the query to split it to phrases. The boundaries of melody fragment corresponding to each phrase are allowed for adjusting in a limited scope. The algorithm employs Dynamic Programming...
In this paper, we explore an approach to improved confidence measures based on a novel alignment confusion rate (ACR) which integrates alignment information from two different modeling unit sets in Chinese digits recognition system. Both initial-final (IF) phone set and head-body-tail (HBT) models have proven to obtain good recognition performance for connected digit strings. These two different modeling...
The tone is a distinctive discriminative feature in Mandarin Chinese. Often functional, yet seldom thorough are most large-scale Mandarin speech recognition systems in treating tone modeling. In particular, many lack the necessary sophistication to deal with the myriad variations arising from the combination of acoustic and lexical contexts. This paper reports an attempt to account for these variabilities...
This paper describes the system and algorithmic developments in the automatic transcription of Mandarin broadcast speech made at IBM in the second year of the DARPA GALE program. Technical advances over our previous system include improved acoustic models using embedded tone modeling, and a new topic-adaptive language model (LM) rescoring technique based on dynamically generated LMs. We present results...
This paper describes the technical and system building advances in the automatic transcription of Mandarin broadcast speech made at IBM in the first year of the DARPA GALE program. In particular, we discuss the application of minimum phone error (MPE) discriminative training and a new topic-adaptive language modeling technique. We present results on both the RT04 evaluation data and two larger community-defined...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.