Kernel spectrogram models for source separation

Antoine Liutkus; Zafar Rafii; Bryan Pardo; Derry Fitzgerald; Laurent Daudet

doi:10.1109/HSCMA.2014.6843240

Kernel spectrogram models for source separation

Liutkus, Antoine, Rafii, Zafar, Pardo, Bryan, Fitzgerald, Derry, Daudet, Laurent

Source

2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA) > 6 - 10

Abstract

In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any time-frequency bin is close to its value in a neighbourhood indicated by a source-specific proximity kernel. The rationale for this model is to easily account for features like periodicity, stability over time or frequency, self-similarity, etc. In many cases, such local dynamics are indeed much more natural to assess than any global model such as a tensor factorization. This framework permits one to use different proximity kernels for different sources and to estimate them blindly using their mixtures only. Estimation is performed using a variant of the kernel backfitting algorithm that allows for multichannel mixtures and permits parallelization. Experimental results on the separation of vocals from musical backgrounds demonstrate the efficiency of the approach.