The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end...
In this paper, we present an investigation on technical details of the byte-level convolutional layer which replaces the conventional linear word projection layer in the neural language model. In particular, we discuss and compare the effective filter configurations, pooling types and the use of bytes instead of characters. We carry out experiments on language packs released by the IARPA Babel project...
Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence...
In recent years, so-called, “end-to-end” speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a speech corpus, is typically performed by using automatic speech recognition (ASR) to generate an index. Previous work has evaluated the use of end-to-end systems for ASR on well known corpora (WSJ, Switchboard,...
Training neural network acoustic models on limited quantities of data is a challenging task. A number of techniques have been proposed to improve generalisation. This paper investigates one such technique called stimulated training. It enables standard criteria such as cross-entropy to enforce spatial constraints on activations originating from different units. Having different regions being active...
Multilingual (ML) representations play a key role in building speech recognition systems for low resource languages. The IARPA sponsored BABEL program focuses on building speech recognition (ASR) and keyword search (KWS) systems in over 24 languages with limited training data. The most common mechanism to derive ML representations in the BABEL program has been with the use of a two-stage network,...
In this paper we report our approaches to accomplishing the very limited resource keyword search (KWS) task in the NIST Open Keyword Search 2015 (OpenKWS15) Evaluation. We devised the methods, first, to attain better acoustic modeling, multilingual and semi-supervised acoustic model training as well as the examplar-based acoustic model training; second, to address the overwhelming out-of-vocabulary...
This paper investigates data augmentation for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity. Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), are investigated for both deep neural networks (DNNs) and convolutional neural networks (CNNs). The approaches are focused on increasing...
The OpenKWS14 keyword search evaluation is one of the most challenging and influential evaluations in the field of speech recognition. Its goal is to build a high-performance keyword search system for a minority language with limited training data in a short period of time. We present the system of the Department of Electronic Engineering, Tsinghua University (THUEE team) for the OpenKWS14 keyword...
In particular for “low resource” Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech...
We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization...
This paper considers an unsupervised data selection problem for the training data of an acoustic model and the vocabulary coverage of a keyword search system in low-resource settings. We propose to use Gaussian component index based n-grams as acoustic features in a submodular function for unsupervised data selection. The submodular function provides a near-optimal solution in terms of the objective...
This paper investigates a weighted finite state transducer (WFST) based syllable decoding and transduction method for keyword search (KWS), and compares it with sub-word search and phone confusion methods in detail. Acoustic context dependent phone models are trained from word forced alignments and then used for syllable decoding and lattice generation. Out-of-vocabulary (OOV) keyword pronunciations...
In this paper we describe approaches to building our recent Malay broadcast news audio retrieval system. This system contains speech-to-text and keyword search subsystems. The speech-to-text system is built aiming at two folds: hybrid vocabulary recognition to tackle out-of-vocabulary keyword search issue and diversified acoustic modeling for effective system combination in keyword searching afterwards...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.