The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, the results of quality and intelligibility assessment of speech masked by stationary and nonstationary noise have been proposed. Subjective speech quality assessment technique has been used to show that white noise masking ability is lower than one for pink and even for brown noise when SNR is less than 0 dB. Two algorithms of nonstationary noise forming have been proposed. They are...
This paper deals with the localization of multiple sources from two-channel mixtures recorded in a reverberant environment. We introduce new angular spectrum-based methods relying on the signal-to-noise ratio (SNR) to estimate the time difference of arrival (TDOA) of each source. We propose and compare five ways of estimating the SNR in each time-frequency point and in each direction, using beamforming...
In this paper an improved method of speech enhancement using Power Spectral Density (PSD) codebooks of clean speech and several types of noise, is proposed. The proposed algorithm estimates the PSDs of speech and noise of unknown nature and evaluates the input Signal-to-Noise Ratio (SNR) by solving an over-determined set of equations as in the previous version. However, the search method used for...
This paper aims at evaluating the performance of a “Lombard effect model” for improving speech intelligibility over telephone channel. It is well known that the naturalness and intelligibility of speech degrades rapidly in communication channels, such as phone networks or public address systems. To reduce the degradation, a ”Lombard effect mimicking” system has been proposed to modify the variations...
Mel-frequency cepstrum coefficient (MFCC) is a widely used feature vector in speech signal precessing. Its feature extraction procedure can be seen as a mapping function which transfers the input speech signals to output MFCC feature vectors. However, this function is too complex to analyze and even a simple approximation is not easy to obtain. This paper studies the effects of each MFCC feature extraction...
Transmission of synthetic aperture radar (SAR) data requires large bandwidth due to its inherently high data rate. Consequently, compression of the data is often required. In this paper, we propose a raw SAR data compression algorithm that employs a predictive coding scheme, based on the analysis-by-synthesis encoding method. The proposed algorithm is inspired by code excited linear prediction (CELP)...
We propose a novel, robust estimator for the probability of speech presence at each time-frequency point in the short-time discrete Fourier domain. While existing estimators perform quite reliably in stationary noise environments, they usually exhibit a large false-alarm rate in nonstationary noise that results in a great deal of noise leakage when applied to a speech enhancement task. The proposed...
Speech enhancement under nonstationary environments is a challenging problem. This paper addresses the problem of speech presence probability (SPP) estimation. According to the fact that speech is approximately sparse in time-frequency domain, we integrate time and frequency minimum tracking results to estimate the noise power spectral density and the a posteriori signal-to-noise ratio. A sparseness...
Mismatch between training and test conditions deteriorates the performance of speech recognizers. This paper investigates the combination of parametric histogram equalization (pHEQ) and noise masking to compensate for the mismatch caused by additive noise. The proposed front-end maps the distribution of the observed power spectrum vectors to a target distribution. The target distribution matches the...
In regards to difficult selection of a threshold in wavelet speech enhancement algorithm, a new adaptive threshold algorithm based on minimum description length criterion is proposed in this paper. The algorithm is a completely data-driven method and has very strong adaptability. It has characteristics of no requirement for prior knowledge of noise level and nature, preset threshold and choosing threshold...
In this paper a new iterative method of speech enhancement using Power Spectral Density (PSD) codebooks of clean speech and several types of noise, is proposed. The proposed algorithm estimates the PSDs of speech and noise of unknown nature and, evaluates the input Signal-to-Noise Ratio (SNR) by solving an over-determined set of equations. No Voice Activity Detection (VAD) or other means of noise...
This paper presents a precursor to an objective measure to predict speech intelligibility in binaural listening conditions. Such measures typically consist of a binaural pre-processing stage followed by intelligibility prediction using a monaural measure such as the Speech Intelligibility Index. In this work, an implementation of the equalization-cancellation process using Wiener filters is presented...
In this paper we describe a technique that uses adaptive gain control to achieve noise suppression in speech signals. The method used to map the dynamic range of the signal is based on the human auditory perceptual model. Since the processing is based on the model of human perception, the resulting noise suppressed speech is natural sounding. The computational complexity of the proposed method is...
This paper describes a method to increase speech intelligibility when the speech signal is being transmitted over telephone lines. In order to detect all factors which affect speech intelligibility, we use telephone simulation tool in ITUT Software Tools Library release 2005 (STL2005) to identify the most problematic telephone-channel deteriorations. Of the various effects considered, additive noise...
As speech recognition and spoken language technologies are being transferred to real applications, the need for greater robustness against adverse noise is becoming increasingly apparent. This paper researches a robust speech recognition method based on adaptive noise cancelling (ANC). It obtained the enhanced speech signal by applying a variable-step adaptive noise cancelling algorithm to reduce...
Speech endpoint detection in strong noise environment plays an important role in speech signal processing. Hilbert-Huang Transform (HHT) is based on the local characteristics of signals, which is an adaptive and efficient transformation method. It is particularly suitable for analyzing the non-linear and non-stationary signals such as speech signal. In this paper, we chose the noisy speech signal...
We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods...
Formant frequency is a one of the most important speech feature, which has widespread applications in speech recognition, synthesis, and compression. In this paper, a new time-frequency domain scheme for the estimation of formant frequencies from noise-corrupted speech signals is presented. In order to overcome the adverse effect of noise, instead of conventional autocorrelation function (ACF), a...
A new technique for enhancing audio signal from a noisy nonstationary environment is presented in the paper. Autoregressive (AR) model is used to efficiently exploit the temporally correlated information of audio and noise signals during a short stationary frame. The temporal models of signals and noisy process are combined to construct a state space. The state space appropriately describes that the...
We investigate a general framework for noise reduction which consists in controlling the level of signal distortion while reducing the level of noise. A parameterized non-causal filter that allows for tuning the signal distortion and noise reduction inversely is obtained and is referred to as parameterized multichannel non-causal Wiener filter (PMWF) herein. The same optimization problem leads to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.