Speeding up softmax computations in DNN-based large vocabulary speech recognition by senone weight vector selection

Yingke Zhu; Brian Mak

doi:10.1109/ICASSP.2017.7953175

Speeding up softmax computations in DNN-based large vocabulary speech recognition by senone weight vector selection

Źródło

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5335 - 5339

Abstrakt

Deep neural network has obtained significant accuracy improvement in many large vocabulary continuous speech recognition (LVCSR) tasks. Recently, it was shown that even better performance can be obtained by modeling a larger number of more discriminative senones. However, as the neural network becomes larger, the number of parameters increases greatly, resulting in greater computation cost and slower decoding process. Since in LVCSR systems, most DNN computations are done in the output softmax layer, we propose a senone weight vector selection method in this paper to speed up the DNN softmax computation while keeping the system accuracy more or less the same. We apply clustering on the weight vectors of the softmax layer and group all the senone weight vectors into several clusters. During decoding, we only compute the exact posteriors for senones in the selected clusters. For the senones in the unselected clusters, their posteriors are approximated using their cluster centers. Experimental results show that our speed-up method can reduce DNN computation time by more than 35% with negligible accuracy loss in a DNN model with 60,000 senones on Switchboard.