The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence...
In this paper we propose the use of Long Short-Term Memory recurrent neural networks for speech enhancement. Networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features. Extensive tests are run on 73 k noisy and reverberated utterances from the Audio-Visual Interest Corpus of spontaneous, emotionally...
Non-negative matrix factorization (NMF) has emerged as a promising approach for single-channel speech separation. In this paper, we propose a new method of discriminative learning of NMF. In contrast to conventional approaches where the basis vectors are learned independently on clean signals from each speaker, our approach optimizes all basis vectors jointly to reconstruct both clean signals and...
Supervised learning based speech separation has shown considerable success recently. In its simplest form, a discriminative model is trained as a time-frequency masking function, where the training target is an ideal mask. Ideal masks, such as the ideal binary masks, are structured spectro-temporal patterns. However, previous formulations do not model prominent output structure. In this paper, we...
In many present speech separation approaches, the separation task is formulated as a binary classification problem. Several classification-based approaches have been proposed and performed satisfactorily. However, they do not explicitly model the correlation in time and each time-frequency (T-F) unit is still classified individually. As we know, the speech signal has a very rich time series and temporal...
Binary time-frequency masking and model-based nonnegative matrix factorization (NMF) are two common approaches to speech separation. However, binary masking often suffers from poor perceptual quality, while NMF typically requires pretrained models for both speech and noise and frequently does not perform well. In this paper we examine whether a single or two-stage approach should be used for performing...
Speech separation is a challenging problem at low signal-to-noise ratios (SNRs). Separation can be formulated as a classification problem. In this study, we focus on the SNR level of −5 dB in which speech is generally dominated by background noise. In such a low SNR condition, extracting robust features from a noisy mixture is crucial for successful classification. Using a common neural network classifier,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.