Making Deep Belief Networks effective for large vocabulary continuous speech recognition

Tara N. Sainath; Brian Kingsbury; Bhuvana Ramabhadran; Petr Fousek; Petr Novak; Abdel-rahman Mohamed

doi:10.1109/ASRU.2011.6163900

Making Deep Belief Networks effective for large vocabulary continuous speech recognition

Sainath, Tara N., Kingsbury, Brian, Ramabhadran, Bhuvana, Fousek, Petr, Novak, Petr, Mohamed, Abdel-rahman

Source

2011 IEEE Workshop on Automatic Speech Recognition & Understanding > 30 - 35

Abstract

To date, there has been limited work in applying Deep Belief Networks (DBNs) for acoustic modeling in LVCSR tasks, with past work using standard speech features. However, a typical LVCSR system makes use of both feature and model-space speaker adaptation and discriminative training. This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task. In addition, we provide a recipe for data parallelization of DBN training, showing that data parallelization can provide linear speed-up in the number of machines, without impacting WER.