Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
This paper presents a nonnegative matrix factorization (NMF) components classification algorithm for single-channel speech and music separation. Music only and music-speech mixture segments are firstly classified from the audio stream via audio segmentation technique. Then NMF is applied for signal decomposition. The basis matrix of the NMF output of music only segments provides the prior knowledge...
The concept of the two-dimensional spectro-temporal modulation filtering of the auditory model [1] is implemented for the FFT spectrogram. It analyzes the spectrogram in terms of the temporal dynamics and the spectral structures of the sound. The overlap and add (OLA) method, which is more convenient and reliable than the iterative-projection method proposed in [1], is used to invert the FFT spectrogram...
In this paper, we propose a psychoacoustic approach towards enhancing speech intelligibility in noise. Understanding the relationship between the short-term spectral movement of a sound and a listener's sensitivity towards it, we conjecture that humans rely greatly on Inter-Phoneme Spectral Gradients (IPSGs) to distinguish each phoneme, especially when the short-term speech spectrum is masked by extremely...
We present an algorithm to find a low-dimensional decomposition of a spectrogram by formulating this as a regularized non-negative matrix factorization (NMF) problem with a regularization term chosen to encourage independence. This algorithm provides a better decomposition than standard NMF when the underlying sources are independent. It is directly applicable to non-square matrices, and it makes...
We present an algorithm to dereverberate single- and multi-channel audio recordings. The proposed algorithm models the magnitude spectrograms of clean audio signals as histograms drawn from a multinomial process. Spectrograms of reverberated signals are obtained as histograms of draws from the PDF of the sum of two random variables, one representing the spectrogram of clean speech and the second the...
In this paper, the problem of extracting periodic signals, like voiced speech or tones in music, from noisy observations or mixtures of periodic signals is considered, and, in particular, the problem of designing filters for such a task. We propose a novel filter design that 1) is specifically aimed at extracting periodic signals, 2) is optimal given the observed signal and thus signal-adaptive, and...
In this paper, we propose a novel method of refining the time-domain synthesis of individual source estimates from a single channel mixture. Employing a closed-loop architecture, the algorithm refines the synthesis of each source by iteratively estimating the phase of the sources, given the estimates of the source magnitude spectra and a single channel time-domain mixture. The performance of the algorithm...
An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing) prior to decoding, and to replace the missing ones by clean speech estimates. We present a novel method based on techniques from the field of Compressive Sensing to obtain these clean speech estimates. Unlike previous imputation frameworks which...
We introduce a non-negative matrix factorization technique which learns speech features with temporal extent in the presence of non-stationary noise. Our proposed technique, namely Sparse convolutive robust non-negative matrix factorization, is robust in the presence of noise due to our explicit treatment of noise as an interfering source in the factorization. We derive multiplicative update rules...
Transcription of music is the process of generating a symbolic representation such as a score sheet or a MIDI file from an audio recording of a piece of music. A statistical machine learning approach for detecting note onsets in polyphonic piano music is presented. An area from the spectrogram of the sound is concatenated into one feature vector. A cascade of boosted classifiers is used for dimensionality...
This paper introduces an algorithm to separate speech streams from a single-channel speech mixture. Most current speech segregation algorithms allocate speech regions to participating speakers depending on which speaker dominates in which spectro-temporal region. The proposed method is a different approach to speech segregation, in that it separates the participating speaker streams rather than decide...
In previous work we introduced a new missing data imputation method for ASR, dubbed sparse imputation. We showed that the method is capable of maintaining good recognition accuracies even at very low SNRs provided the number of mask estimation errors is sufficiently low. Especially at low SNRs, however, mask estimation is difficult and errors are unavoidable. In this paper, we try to reduce the impact...
This paper presents a blind dereverberation method designed to recover the subband envelope of an original speech signal from its reverberant version. The problem is formulated as a blind deconvolution problem with non-negative constraints, regularized by the sparse nature of speech spectrograms. We derive an iterative algorithm for its optimization, which can be seen as a special case of the non-negative...
The wavelet transform has become a powerful tool of signal analysis and is widely used in many applications including signal detection and de-noising. Wavelet thresholding de-noising techniques provide a new way to reduce background noise in speech signal. However, the soft thresholding is best in reducing noise but worst in preserving edges, and hard thresholding is best in preserving edges but worst...
Gain function of traditional enhancement algorithm is to estimate every signal spectral component, therefore, this introduce relatively more speech distortion. To improve the effect of speech enhancement at low signal-to-noise ratio (SNR), this paper proposed a optimal speech enhancement scheme. Based on auditory perception properties, no estimator for noise masked spectrum and classical enhancement...
This paper proposes a two stage hybrid speech enhancement system with nonuniform subbands. Frequency bins after Fourier transform are nonuniformly grouped to reduce the computations in calculating the spectral gain. First stage includes a soft decision gain modification and applied to the Ephraim-Malah gain function based on minimum mean square error estimation (MMSE) and a psychoacoustic masking...
In this work, we present a new mask estimation technique that uses a neural network classifier to determine the reliability of spectrographic elements. In addition some different kinds of features used for classification were compared that make no assumptions about the corrupting noise signal, but rather exploit spectrographic characteristics of the speech signal. The performance of the proposed method...
In this paper, we perform the noise suppression based on approximate Karhunen-Loeve transform (KL T). The discrete cosine transform(DCT) has been a good candidate for approximate KLT when the signal is modeled as an autoregressive process. However, for nonstationary signals, wavelet transform is more capable than DCT while approximating KLT. To calculate approximate KLT, we first represent the signal...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.