Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
Recently, we have proposed a general adaptation scheme for deep neural network based on discriminant condition codes and applied it to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion [1, 2, 3, 4]. In this case, each condition code is associated with one speaker in data, which is thus called...
Recently, it has been reported that context-dependent deep neural network (DNN) has achieved some unprecedented gains in many challenging ASR tasks, including the well-known Switchboard task. In this paper, we first investigate DNN for several large vocabulary speech recognition tasks. Our results have confirmed that DNN can consistently achieve about 25–30% relative error reduction over the best...
Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of local filtering and max-pooling in the CNN architecture. In this paper, we propose to apply CNN to speech recognition within the framework of hybrid NN-HMM model. We propose to use local filtering and max-pooling in frequency domain...
In this paper, we have proposed a new method to construct an auxiliary function for the discriminative training of HMMs in speech recognition. The new auxiliary function serves as a first-order approximation of the original objective function but more importantly it remains as a lower bound of the original objective function as well. Furthermore, the trust region (TR) method in [1] is applied to find...
We present a novel discriminative training algorithm for n-gram language models for use in large vocabulary continuous speech recognition. The algorithm uses large margin estimation (LME) to build an objective function for maximizing the minimum margin between correct transcriptions and their competing hypotheses, which are encoded as word graphs generated from the Viterbi decoding process. The nonlinear...
This paper presents a novel discriminative training algorithm for n-gram language models for use in large vocabulary continuous speech recognition. The algorithm uses Maximum Mutual Information Estimation (MMIE) to build an objective function that involves a metric computed between correct transcriptions and their competing hypotheses, which are encoded as word graphs generated from the Viterbi decoding...
In this paper, we present a new fast optimization method to solve large margin estimation (LME) of continuous density hidden Markov models (CDHMMs) for speech recognition based on second order cone programming (SOCP). SOCP is a class of nonlinear convex optimization problems which can be solved very efficiently. In this work, we have formulated the LME of CDHMMs as an SOCP problem and proposed two...
In this paper, we present a new optimization method for MMIE-based discriminative training of HMMs in speech recognition. In our method, the MMIE training of Gaussian mixture HMMs is formulated as a so-called trust region problem, where a quadratic objective function is minimized under a spherical constraint, so that an efficient global optimization method for the trust region problem can be used...
Recently, we proposed a novel optimization algorithm called constrained line search (CLS) to train Gaussian mean vectors of HMMs in the MMI sense. In this paper, we extend and re-formulate it in a more general framework. The new CLS can optimize any discriminative objective functions including MMI, MCE, MPE/MWE etc. Also, closed-form solutions to update all Gaussian mixture parameters, including means,...
In this paper, we propose a novel constrained line search to optimize the MMEE objective function for training discriminative HMMs. In our method, the MMI estimation is cast as a constrained maximization problem, where Kullback-Leibler divergence between models before and after parameters adjustment is introduced as a constraint during optimization. Then, based on the idea of line search, we show...
We propose to use minimum divergence, where acoustic similarity between HMMs is characterized by Kullback-Leibler divergence, for discriminative training. The MD objective function is defined as a posterior weighted divergence measured over the whole training set. Different from our earlier work, where KLD-based acoustic similarity is pre-computed for all initial models and stays invariant in the...
Our previous study on maximum relative margin estimation (MRME) of HMM (C. Liu et al., 2005) demonstrated its advantage over the standard minimum classification error (MCE) training. In this paper, we report our recent improvement on MRME. Specifically, two novel approaches are proposed to handle recognition errors in training sets for the MRME. One is a new training criterion based on a combination...
In this paper, we propose a new optimization method, i.e., constrained joint optimization method, to solve the minimax optimization problem in large margin estimation (LME) of continuous density hidden Markov model (CDHMM) for speech recognition. First, we mathematically analyze the definition of margin and introduce some theoretically-sound constraints into the minimax optimization to guarantee the...
Based on the principle of large margin classifier, recently we proposed two novel training methods, namely large margin estimation (LME) [8] and maximum relative margin estimation (MRME) [9] for speech recognition. In LME or MRME, HMM parameters are estimated to maximize the minimum margin among all training utterances. However their original formulation is limited to isolated-word ASR tasks. In this...
In this paper, we propose a dynamic in-search discriminative training approach of a large-scale HMM model for large vocabulary speech recognition. A previously proposed data selection method is used to choose competing hypotheses dynamically during Viterbi beam search procedure. Particularly, all active word-ending paths are examined during search with reference transcription to identify competing...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.