The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper compared the performance of different acoustic modeling units in deep neural networks (DNNs) based large vocabulary continuous speech recognition (LVCSR) systems for Chinese. Recently, the deep neural networks based acoustic modeling method has achieved very competitive performance for many speech recognition tasks, and has become the focus of current LVCSR research. Some previous work...
Prosody is a kind of cues that are critical to human speech perception and comprehension, so it is plausible to integrate prosodic information into machine speech recognition. However, as a result of the supra-segmental nature, it is hard to integrate prosodic information with conventional acoustic features. Recently, RNNLMs have shown to be the state-of-the-art language model in many tasks. We thus...
Chinese Syllable-to-Character (S2C) conversion is the important component for Input Methods, and the key problem in Chinese S2C conversion is the serious phenomenon in Chinese language. In order to disambiguate homophones to improve Chinese S2C conversion, in this paper, Chinese S2C conversion is treated as a sequence labelling task, and the recurrent neural network (RNN) based on supervise sequence...
Non-thermal plasma (NTP) and combined plasma-MnO2 catalytic (CPMC) air cleaners were tested for removal of low-concentration benzene in air. Both air cleaners were made of stainless steel needle matrix plate and used DC corona discharger. The effects of discharge power and relative humidity (RH) on benzene removal efficiency were investigated in a closed chamber. The intermediate products produced...
Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous research on constructing...
s have been shown to give state-of-the-art performance on many speech recognition tasks. To achieve a further performance improvement, in this paper, maxout units are proposed to be integrated with the LSTM cells, considering those units have brought significant improvements to deep feed-forward neural networks. A novel architecture was constructed by replacing the input activation units (generally...
Recently, context dependent (CD)-deep neural network (DNN)-hidden Markov model (HMM) obtains significant improvements in many automatic speech recognition (ASR) tasks. In the standard training procedure for CD-DNN-HMM, the Gaussian mixture models (GMM) based ASR system has to be firstly built to pre-segment the training data and to define the CD states as the targets for DNN. In this paper, we propose...
Aiming at constructing the pronunciation dictionary for Mandarin speech recognition, an automatic error-driven and incremental approach is proposed based on the acoustic confusion network. This method considers both of the acoustic and language information, constructs a dictionary through words selection and composition to optimal the performance of ASR directly. During the process, removing and splitting...
Recently, deep neural network (DNN) with hidden Markov model (HMM) has turned out to be a superior sequence learning framework, based on which significant improvements were achieved in many application tasks, such as automatic speech recognition (ASR). However, the training of DNN-HMM requires the pre-segmented training data, which can be generated using Gaussian Mixture Model (GMM) in ASR tasks....
Recurrent neural network language models (RNNLMs) have been successfully applied in a variety of language processing applications ranging from speech recognition to machine translation. They can fight the curse of dimensionality by learning a distributed representation (word vector). The components of these vectors measure the co-occurrence of the word with context features over a corpus. However,...
This paper describes a query-based composition algorithm that can integrate an ARPA format language model in the unified WFST framework, which avoids the memory and time cost of converting the language models to WFST and optimizing the WFST of language models. The proposed algorithm is applied to on-the-fly one-pass decoder and rescoring decoder. Both modified decoder require less memory during decoding...
Chinese, which is quite different from western languages, has no standard definition of word. Therefore, choosing suitable lexicon plays an important role in Chinese language modeling. This paper proposes a novel method of constructing the lexicon automatically. Other than depending on statistical measures of text features, this method is directly based on the feedback of errors from the corresponding...
Recently, the deep neural networks (DNNs) based acoustic modeling methods have been successfully applied to many speech recognition tasks. This paper reports the work about applying DNNs for syllable based acoustic modeling in Chinese automatic speech recognition (ASR). Compared with initial/finals (IFs), syllable can implicitly model the intra-syllable variations in better accuracy. However, the...
This paper concentrates on the effect of part-of-speech on Mandarin speech recognition by incorporating it into language model and pronunciation dictionary. This work is motivated by the two benefits of part-of-speech, one is to reduce the lexical ambiguity in language model to some extent and the other is to provide some information about the pronunciation of heteronyms. The experiments conducted...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.