The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Non-negative matrix factorisations are used in several branches of signal processing and data analysis for separation and classification. Sparsity constraints are commonly set on the model to promote discovery of a small number of dominant patterns. In group sparse models, atoms considered to belong to a consistent group are permitted to activate together, while activations across groups are suppressed,...
The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture...
In this work, we consider enhancing a target speech from a singlechannel noisy observation corrupted by non-stationary noises at low signal-to-noise ratios (SNRs). We take a classification-based approach, where the objective is to estimate an Ideal Binary Mask (IBM) that classifies each time-frequency (T-F) unit of the noisy observation into one of the two categories: speech-dominant unit or noise-dominant...
In this paper we propose a novel multi-channel algorithm to separate simultaneous speakers in an environment where the microphone array is subject to movement. When the microphones are mounted to a person's head, for instance, the movements can lead to ambiguities with respect to the sources and to distortions in the processed signal. The proposed system estimates the direction-of-arrival of the speaker's...
We present a novel approach to query-by-example keyword spotting (KWS) using a long short-term memory (LSTM) recurrent neural network-based feature extractor. In our approach, we represent each keyword using a fixed-length feature vector obtained by running the keyword audio through a word-based LSTM acoustic model. We use the activations prior to the softmax layer of the LSTM as our keyword-vector...
Discontinuous Transmission (DTX) is an efficient way to drastically reduce the transmission rate of a communication codec in the absence of voice input. In this mode, most frames that are determined to consist of background noise only are dropped from transmission and replaced by some Comfort Noise Generation (CNG) in the decoder. In this paper, we propose a novel CNG approach combining information...
A hearing model, which is parameterized by hearing thresholds, degrees of loudness recruitment and reductions of frequency resolution of a hearing-impaired (HI) patient, is proposed in this paper. The model is developed in the filter-bank framework and is flexible for fitting hearing-loss conditions of HI patients. Psychoacoustic experiments were conducted under clean and noisy conditions to validate...
The combination of noise and reverberation make listening conditions difficult for cochlear implant (CI) users. The perceptual effect of reverberation was evaluated via speech intelligibility tests with CI users. A fixed directional microphone, an adaptive directional microphone and a beamformer post-filter were evaluated. Reverberation was varied by changing the target and noise distance and by simulating...
We present a general single-channel speech dereverberation method based on an explicit generative model of reverberant and noisy speech. To regularize the model, we use a pre-learned speech model of clean and dry speech as a prior and perform posterior inference over the latent clean speech. The reverberation kernel and additive noise are estimated under the maximum-likelihood framework. Our model...
The estimation of the decay rate of a signal section is an integral component of both blind and non-blind reverberation time estimation methods. Several decay rate estimators have previously been proposed, based on, e.g., linear regression and maximum-likelihood estimation. Unfortunately, most approaches are sensitive to background noise, and/or are fairly demanding in terms of computational complexity...
Vector Taylor Series (VTS) based model compensation approach has been successfully applied to various robust speech recognition tasks. In this paper, we propose a novel method of variable transformation to calculate the static statistics. In addition, we provide a detailed explanation of VTS and random variable transformations adopted in some recent papers. Experiments on Aurora 4 showed that the...
Beamforming and channel equalizers can be formulated as optimal multichannel filter-and-sum operations with different objective criteria. It has been shown in previous studies that the combination of both concepts under a common framework can yield results that combine both the spatial robustness of beamforming and the dereverberation performance of channel equalization. This paper introduces an additional...
Speech and audio signal processing research is a tale of data collection efforts and evaluation campaigns. Large benchmark datasets for automatic speech recognition (ASR) have been instrumental in the advancement of speech recognition technologies. However, when it comes to robust ASR, source separation, and localization, especially using microphone arrays, the perfect dataset is out of reach, and...
The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...
We introduce in this paper a novel non-blind speech enhancement procedure based on visual speech recognition (VSR). The latter is based on a generative process that analyzes sequences of talking faces and classifies them into visual speech units known as visemes. We use an effective graphical model able to segment and label a given sequence of talking faces into a sequence of visemes. Our model captures...
In this paper a problem in transient noise suppression for audio streams in laptop and netbook devices is addressed. One or more microphones record voice signals which are corrupted with ambient noise and also transient noise from keyboard and mouse clicks. In the current work, a synchronous reference microphone is embedded in the keyboard which allows for measurement of the key click noise, substantially...
Recent studies have demonstrated the potential of unsupervised feature learning for sound classification. In this paper we further explore the application of the spherical k-means algorithm for feature learning from audio signals, here in the domain of urban sound classification. Spherical k-means is a relatively simple technique that has recently been shown to be competitive with other more complex...
Speech signal is often contaminated by both room reverberation and ambient noise. In this contribution, we propose a nested generalized sidelobe canceller (GSC) beamforming structure, comprising an inner and an outer GSC beamformers (BFs), that decouple the speech dereverberation and the noise reduction operations. The BFs are implemented in the short-time Fourier transform (STFT) domain. Two alternative...
Reverberation affects the quality and intelligibility of distant speech recorded in a room. Direct-to-Reverberant Ratio (DRR) is a useful measure for assessing the acoustic configuration and can be used to inform dereverberation algorithms. We describe a novel DRR estimation algorithm applicable where the signal was recorded with two or more microphones, such as mobile communications devices and laptops...
Non-negative matrix factorization (NMF) has been widely used for challenging single-channel audio source separation tasks. However, inference in NMF-based models relies on iterative inference methods, typically formulated as multiplicative updates. We propose “deep NMF”, a novel non-negative deep network architecture which results from unfolding the NMF iterations and untying its parameters. This...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.