The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The paper describes automatic speech recognition experiments and results on the spontaneous Hungarian MALACH speech corpus. A novel morph-based lexical modeling approach is compared to the traditional word-based one and to another, previously best performing morph-based one in terms of word and letter error rates. The applied language and acoustic modeling techniques are also detailed. Using unsupervised...
Inspired by the success of multi-task training in acoustic modeling, this paper investigates a new architecture for a multi-domain neural network based language model (NNLM). The proposed model has several shared hidden layers and domain-specific output layers. As will be shown, the log-linear interpolation of the multi-domain outputs and the optimization of interpolation weights fit naturally in...
This paper examines the impact of multilingual (ML) acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program. The task is to develop Swahili ASR and KWS systems within two weeks using as little as 3 hours of transcribed data. Multilingual acoustic representations proved to...
In the tandem approach, the output of a neural network (NN) serves as input features to a Gaussian mixture model (GMM) aiming to improve the emission probability estimates. As has been shown in our previous work, GMM with pooled covariance matrix can be integrated into a neural network framework as a softmax layer with hidden variables, which allows for joint estimation of both neural network and...
In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By...
This paper investigates the application of hierarchical MRASTA bottleneck (BN) features for under-resourced languages within the IARPA Babel project. Through multilingual training of Multilayer Perceptron (MLP) BN features on five languages (Cantonese, Pashto, Tagalog, Turkish, and Vietnamese), we could end up in a single feature stream which is more beneficial to all languages than the unilingual...
In this paper, we describe the RWTH speech recognition system for English lectures developed within the Translectures project. A difficulty in the development of an English lectures recognition system, is the high ratio of non-native speakers. We address this problem by using very effective deep bottleneck features trained on multilingual data. The acoustic model is trained on large amounts of data...
Hierarchical Multi Layer Perceptron (MLP) based long-term feature extraction is optimized for TANDEM connectionist large vocabulary continuous speech recognition (LVCSR) system within the QUAERO project. Training the bottleneck MLP on multi-resolutional RASTA filtered critical band energies, more than 20% relative word error rate (WER) reduction over standard MFCC system is observed after optimizing...
In this paper, Multi Layer Perceptron (MLP) based multilingual bottleneck features are investigated for acoustic modeling in three languages — German, French, and US English. We use a modified training algorithm to handle the multilingual training scenario without having to explicitly map the phonemes to a common phoneme set. Furthermore, the cross-lingual portability of bottleneck features between...
A part-tone decomposition of voiced sections of speech is introduced, which is adapted with high accuracy to the frequency of the glottal oscillator of the speaker. The iterative replacement of the center filter frequency contours (chosen locally as linear chirp) of the non-stationary bandpass filters converges extremely fast and leads to the extraction of filter-stable part-tones with uncorrupted...
Multi Layer Perceptron (MLP) features extracted from different types of critical band energies (CRBE) — derived from MFCC, GT, and PLP pipeline — are compared on French broadcast news and conversational speech recognition task. Though the MLP structure is kept fixed, ROVER combination of different CRBE based systems leads to 4% relative improvement. Furthermore, aiming at the combination of state-of-the-art...
In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speech, this paper introduces the non-stationary signal analysis into the ASR framework. We present new acoustic features extracted by a pitch-adaptive Gammatone filter bank. The noise robustness was proved on AURORA 2 and 4 tasks,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.