The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Acoustic beamforming can greatly improve the performance of Automatic Speech Recognition(ASR) and speech enhancement systems when multiple channels are available. We recently proposed a way to support the model-based Generalized Eigenvalue beamforming operation with a powerful neural network for spectral mask estimation. The enhancement system has a number of desirable properties. In particular, neither...
Multi-channel speech enhancement algorithms rely on a synchronous sampling of the microphone signals. This, however, cannot always be guaranteed, especially if the sensors are distributed in an environment. To avoid performance degradation the sampling rate offset needs to be estimated and compensated for. In this contribution we extend the recently proposed coherence drift based method in two important...
In this paper we show how a neural network for spectral mask estimation for an acoustic beamformer can be optimized by algorithmic differentiation. Using the beamformer output SNR as the objective function to maximize, the gradient is propagated through the beamformer all the way to the neural network which provides the clean speech and noise masks from which the beamformer coefficients are estimated...
This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation...
We present a neural network based approach to acoustic beamforming. The network is used to estimate spectral masks from which the Cross-Power Spectral Density matrices of speech and noise are estimated, which in turn are used to compute the beamformer coefficients. The network training is independent of the number and the geometric configuration of the microphones. We further show that it is possible...
We present an algorithm for clustering complex-valued unit length vectors on the unit hypersphere, which we call complex spherical k-mode clustering, as it can be viewed as a generalization of the spherical k-means algorithm to normalized complex-valued vectors. We show how the proposed algorithm can be derived from the Expectation Maximization algorithm for complex Watson mixture models and prove...
We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory...
This contribution presents a Direction of Arrival (DoA) estimation algorithm based on the complex Watson distribution to incorporate both phase and level differences of captured microphone array signals. The derived algorithm is reviewed in the context of the Generalized State Coherence Transform (GSCT) on the one hand and a kernel density estimation method on the other hand. A thorough simulative...
In this paper we present a source counting algorithm to determine the number of speakers in a speech mixture. In our proposed method, we model the histogram of estimated directions of arrival with a non-parametric Bayesian infinite Gaussian mixture model. As an alternative to classical model selection criteria and to avoid specifying the maximum number of mixture components in advance, a Dirichlet...
This contribution describes a step-wise source counting algorithm to determine the number of speakers in an offline sce-nario. Each speaker is identified by a variational expectation maximization (VEM) algorithm for complex Watson mixture models and therefore directly yields beamforming vectors for a subsequent speech separation process. An observation selection criterion is proposed which improves...
Vehicle-to-grid (V2G) energy transfer in a smart grid environment opens a new revenue opportunity for electric-drive vehicles (EVs), and might reduce grid operation costs in demand-constrained urban feeders where peak-electricity prices are high. This paper analyses the peak demand energy market for V2G in the urban region of Florianópolis, Brazil. The article describes known V2G-concepts and introduces...
In this contribution we derive a variational EM (VEM) algorithm for model selection in complex Watson mixture models, which have been recently proposed as a model of the distribution of normalized microphone array signals in the short-time Fourier transform domain. The VEM algorithm is applied to count the number of active sources in a speech mixture by iteratively estimating the mode vectors of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.