Segmental acoustic indexing for zero resource keyword search

Keith Levin; Aren Jansen; Benjamin Van Durme

doi:10.1109/ICASSP.2015.7179089

Segmental acoustic indexing for zero resource keyword search

Levin, Keith, Jansen, Aren, Van Durme, Benjamin

Source

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5828 - 5832

Abstract

The task of zero resource query-by-example keyword search has received much attention in recent years as the speech technology needs of the developing world grow. These systems traditionally rely upon dynamic time warping (DTW) based retrieval algorithms with runtimes that are linear in the size of the search collection. As a result, their scalability substantially lags that of their supervised counterparts, which take advantage of efficient word-based indices. In this paper, we present a novel audio indexing approach called Segmental Randomized Acoustic Indexing and Logarithmic-time Search (S-RAILS). S-RAILS generalizes the original frame-based RAILS methodology to word-scale segments by exploiting a recently proposed acoustic segment embedding technique. By indexing word-scale segments directly, we avoid higher cost frame-based processing of RAILS while taking advantage of the improved lexical discrimination of the embeddings. Using the same conversational telephone speech benchmark, we demonstrate major improvements in both speed and accuracy over the original RAILS system.