Gradient-based acoustic features for speech recognition

Takashi Muroi; Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

doi:10.1109/ISPACS.2009.5383805

Gradient-based acoustic features for speech recognition

Muroi, T., Takashima, R., Takiguchi, T., Ariki, Y.

Source

2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) > 445 - 448

Abstract

This paper proposes a novel feature extraction method for speech recognition based on gradient features on a 2D time-frequency matrix. Widely used MFCC features lack temporal dynamics. In addition, ??MFCC is an indirect expression of temporal frequency changes. To extract the temporal dynamics more directly, we propose local gradient features in an area around a reference position. The gradient-based features were originally proposed as HOG (histograms of oriented gradients) and applied to human body detection in image recognition. In this paper, we expand the application to include gradient-based acoustic features in speech recognition. The novel acoustic features were evaluated on a word-speech recognition task, and the results showed a significant improvement for clean speech and even for noisy speech when coupled with MFCC.