DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements

Yuma Koizumi; Kenta Niwa; Yusuke Hioka; Kazunori Kobayashi; Yoichi Haneda

doi:10.1109/ICASSP.2017.7952122

DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements

Koizumi, Yuma, Niwa, Kenta, Hioka, Yusuke, Kobayashi, Kazunori, Haneda, Yoichi

Source

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 81 - 85

Abstract

We investigated whether a deep neural network (DNN)-based source enhancement function can be self-optimized by reinforcement learning (RL). The use of a DNN is a powerful approach to describing the relationship between two sets of variables and can be useful for source enhancement function design. By training the DNN using a huge amount of training data, sound quality of output signals are improved. However, collecting a huge amount of training data is often difficult in practice. To use limited training data efficiently, we focus on the “self-optimization” of DNN-based source enhancement function in which RL is commonly utilized in the development of game playing computers. As a reward for RL, quantitative metrics that reflect a human's perceptual score (perceptual score), e.g., perceptual evaluation methods for audio source separation (PEASS), are utilized. To investigate whether the sound quality is improved by RL-based source enhancement, subjective tests were conducted. It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.