Labeling unsegmented sequence data with DNN-HMM and its application for speech recognition

Xiangang Li; Xihong Wu

doi:10.1109/ISCSLP.2014.6936622

Labeling unsegmented sequence data with DNN-HMM and its application for speech recognition

Source

The 9th International Symposium on Chinese Spoken Language Processing > 10 - 14

Abstract

Recently, deep neural network (DNN) with hidden Markov model (HMM) has turned out to be a superior sequence learning framework, based on which significant improvements were achieved in many application tasks, such as automatic speech recognition (ASR). However, the training of DNN-HMM requires the pre-segmented training data, which can be generated using Gaussian Mixture Model (GMM) in ASR tasks. Thus, questions are raised by many researchers: can we train the DNN-HMM without GMM seeding, and what does it suggest if the answer is yes? In this research, we come up with the ‘yes’ answer by presenting forward-backward learning algorithm for DNN-HMM framework. Besides, a training procedure is proposed, in which, the training for context independent (CI) DNN-HMM is treated as the pre-training for context dependent (CD) DNN-HMM. To evaluate the contribution of this work, experiments on ASR task with the benchmark corpus TIMIT are performed, and the results demonstrate the effectiveness of this research.