Phoneme-specific speech separation

Zhong-Qiu Wang; Yan Zhao; DeLiang Wang

doi:10.1109/ICASSP.2016.7471654

Phoneme-specific speech separation

Wang, Zhong-Qiu, Zhao, Yan, Wang, DeLiang

Source

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 146 - 150

Abstract

Speech separation or enhancement algorithms seldom exploit information about phoneme identities. In this study, we propose a novel phoneme-specific speech separation method. Rather than training a single global model to enhance all the frames, we train a separate model for each phoneme to process its corresponding frames. A robust ASR system is employed to identify the phoneme identity of each frame. This way, the information from ASR systems and language models can directly influence speech separation by selecting a phoneme-specific model to use at the test stage. In addition, phoneme-specific models have fewer variations to model and do not exhibit the data imbalance problem. The improved enhancement results can in turn help recognition. Experiments on the corpus of the second CHiME speech separation and recognition challenge (task-2) demonstrate the effectiveness of this method in terms of objective measures of speech intelligibility and quality, as well as recognition performance.