The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We study the problem of compressing recurrent neural networks (RNNs). In particular, we focus on the compression of RNN acoustic models, which are motivated by the goal of building compact and accurate speech recognition systems which can be run efficiently on mobile devices. In this work, we present a technique for general recurrent model compression that jointly compresses both recurrent and non-recurrent...
We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its...
We explore techniques to improve the robustness of small-footprint keyword spotting models based on deep neural networks (DNNs) in the presence of background noise and in far-field conditions. We find that system performance can be improved significantly, with relative improvements up to 75% in far-field conditions, by employing a combination of multi-style training and a proposed novel formulation...
We study spoken term detection (STD) - the task of determining whether and where a given word or phrase appears in a given segment of speech - using articulatory feature-based pronunciation models. The models are motivated by the requirements of STD in low-resource settings, in which it may not be feasible to train a large-vocabulary continuous speech recognition system, as well as by the need to...
Several methods have been proposed recently for modeling posterior representations derived from local classifiers [1, 2]. In recent work, Sainath et al. have proposed the use of a tied-mixture-based posterior modeling approach [3] to enhance exemplar-based posterior representations for phone recognition tasks. In this work, we conduct a detailed evaluation to determine the effectiveness of this technique...
Conditional random fields (CRFs) are probabilistic sequence models that have been applied in the last decade to a number of applications in audio, speech, and language processing. In this paper, we provide a tutorial overview of CRF technologies, pointing to other resources for more in-depth discussion; in particular, we describe the common linear-chain model as well as a number of common extensions...
We propose a chunk-based phonetic score for re-scoring word hypotheses for the mobile voice search task. The score is based on a novel technique for aligning decoded phone sequences with forced-alignments of hypothesized word sequences and exploits phone-boundary timing information. In experimental results, we find that the proposed approach results in relative a word error rate reduction of 4.4%...
We investigate joint models of articulatory features and apply these models to the problem of automatically generating articulatory transcriptions of spoken utterances given their word transcriptions. The task is motivated by the need for larger amounts of labeled articulatory data for both speech recognition and linguistics research, which is costly and difficult to obtain through manual transcription...
Conditional random fields (CRFs) have recently found increased popularity in automatic speech recognition (ASR) applications. CRFs have previously been shown to be effective combiners of posterior estimates from multilayer perceptrons (MLPs) in phone and word recognition tasks. In this paper, we describe a novel hybrid Multilayer-CRF structure (ML-CRF), where a MLP-like hidden layer serves as input...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.