Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese

Jingyong Hou; Lei Xie; Zhonghua Fu

doi:10.1109/ISCSLP.2016.7918366

Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese

Source

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

Abstract

We use query-by-example keyword spotting (QbyE-KWS) approach to solve the personalized wake-up word detection problem for small-footprint, low-computational cost on-device applications. QbyE-KWS takes keywords as templates, and matches the templates across an audio stream via DTW to see if the keyword is included. In this paper, we use neural networks as acoustic models to extract DNN/LSTM phoneme posterior features and LSTM embedding features. Specifically, we investigate the LSTM embedding feature extractor for different modeling units in Mandarin, spanning from phonemes to words. We also study the performances of two popular DTW approaches: S-DTW and SLN-DTW. SLN-DTW manages to accurately and effectively search the keyword in a long audio stream without the segmentation procedure that is used in S-DTW approaches. Our study shows that DNN phoneme posterior plus SLN-DTW approach achieves the highest computation efficiency and the state-of-the-art performance with 78% relative miss rate reduction as compared with the S-DTW approach. Word level LSTM embedding feature shows superior performance as compared with other embedding units.