Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition

Antti Hurmalainen; Tuomas Virtanen

doi:10.1109/ICASSP.2012.6288823

Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition

Source

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4113 - 4116

Abstract

Non-negative spectral factorisation has been used successfully for separation of speech and noise in automatic speech recognition, both in feature-enhancing front-ends and in direct classification. In this work, we propose employing spectro-temporal 2D filters to model dynamic properties of Mel-scale spectrogram patterns in addition to static magnitude features. The results are evaluated using an exemplar-based sparse classifier on the CHiME noisy speech database. After optimisation of static features and modelling of temporal dynamics with derivative features, we achieve 87.4% average score over SNRs from 9 to −6 dB, reducing the word error rate by 28.1% from our previous static-only features.

Identifiers

book ISSN :	1520-6149
book e-ISSN :	1520-6149
book ISBN :	978-1-4673-0045-2
book e-ISBN :	978-1-4673-0046-9 , 978-1-4673-0044-5
DOI	10.1109/ICASSP.2012.6288823