The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we summarize and discuss recent results in acoustic echo cancellation and noise reduction with emphasis on methods which combine both aspects. It is shown that echo control and noise reduction can support each other in a true synergy. The paper discusses fundamental issues of algorithm design and suggests that a frequency domain multi-microphone solution might be best suited to achieve...
The performance of hearing aids in noisy reverberant surroundings remains a major source of complaint and discomfort to wearers. Given the current capabilities and pace of development in microelectronics, the major problem is to find successful speech enhancement schemes. “Binaural unmasking” experiments demonstrate an enhancement advantage, due to binaural correlation properties, which can lower...
The development of an application of speech processing in a car environment is addressed. The main objective is to provide the user of a vehicular phone with a powerful and friendly bidirectional vocal interface. In particular, the paper focusses on the speech recogniser component of the interface as it was specifically designed and tuned to operate in the very hostile acoustic environment of a moving...
In this paper we propose a novel approach to cepstral smoothing for reducing musical noise fluctuations in binaural speech enhancement. Similar to other methods, our approach computes a preliminary spectral gain function using the magnitude-squared coherence function and applies an instantaneous weighting to the gain function in the cepstral domain. In this contribution, the weighting function is...
Traditionally, sensor arrays and spatial filtering aim to enhance individual sources by suppressing ambient noise and reverberation. In this paper, the exactly opposite problem is examined, that of suppressing individual sources in favour of the ambient sound and of the whole acoustic scene in general. We consider a compact circular sensor array which is embedded in a crowded ambient acoustic environment...
this paper proposes a directional noise suppressor with a specified constant beamwidth. A directional gain is calculated based on interchannel phase difference and combined with a spectral gain commonly used in single-channelnoise suppressors (NSs). The beamwidth can be specified as passband edges of the directional gain. In order to implement frequency-independent constant beamwidth, frequency-proportionate...
In this paper, a method of adaptive noise suppression combining spatially robust fixed beamforming and the TRINICON blind source separation algorithm is presented. A multichannel sensor array is first processed using complementary fixed beamformers into maximum and minimum SINR channels. The channels form the inputs to a single 2×2 second-order statistics TRINICON-BSS system which adaptively compensates...
In expressive TTS and voice transformation systems, implantation of expressive prosody derived from external out-of-domain sources often leads to extreme pitch modification that compromises the naturalness of the synthesized speech.
This paper describes the time-domain bandwidth extension (TBE) framework employed to code wideband and super-wideband speech in the newly standardized 3GPP EVS codec. The TBE algorithm uses a nonlinear harmonic modeling technique that incorporates principles of time-domain envelope-modulated noise mixing. At 13.2 kbps, the super-wideband coding of speech uses as low as 1.55 kbps for encoding the spectral...
This paper presents two new post-processing techniques to address limitations of the deployed low bit rate speech codecs in case of unvoiced speech and background noise, and in case of music. Both post-processing techniques enhance the spectrum of the decoded excitation signal without increasing the codec algorithmic delay. The paper discusses how to integrate the enhancement procedure of unvoiced...
A Discontinuous transmission (DTX) system, which is widely adopted in speech codecs, is an important function for speech communication systems that can reduce the transmission bandwidth by at least a half. Within a DTX system, the comfort noise generation (CNG) plays a key role in the overall quality. Critical performance parameters with respect to the CNG including the transition quality from active...
Speech intelligibility in noisy environments is still quite limited for cochlear implant (CI) users. Classical beamformers such as the Generalized Sidelobe Canceller (GSC) can provide large improvements in speech intelligibility for CI users. These algorithms have been adopted from hearing aids and multimedia applications into the CI field. However, their optimization taking into consideration the...
Speech coders operating in time domain can be extended with a frequency domain mode to improve encoding of music, even though this is challenging at low delay. In such a scenario, the short analysis window limits the benefit of the transform coder, while a delayless switch between the two coders constrains the system further. The paper presents an LPC and MDCT-based audio coder part of the new 3GPP...
We explore techniques to improve the robustness of small-footprint keyword spotting models based on deep neural networks (DNNs) in the presence of background noise and in far-field conditions. We find that system performance can be improved significantly, with relative improvements up to 75% in far-field conditions, by employing a combination of multi-style training and a proposed novel formulation...
The ability to estimate the number of words spoken by an individual over a certain period of time is valuable in second language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that...
We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin...
While most contributions on speech reinforcement only consider the presence of environmental noise, late reverberation can also severely degrade the intelligibility of speech. In this paper we address the problem of speech reinforcement in noisy and reverberant environments. We use a short-time version of a recently presented approximation of the speech intelligibility index, which we optimize locally...
In this paper, we use unconstrained frequency estimates (UFEs) from a noisy harmonic signal and propose two methods to estimate and track the pitch over time. We assume that the UFEs are multivariate-normally-distributed random variables, and derive a maximum likelihood (ML) pitch estimator by maximizing the likelihood of the UFEs over short time-intervals. As the main contribution of this paper,...
In this work, we investigate the efficacy of Micro Electro-Mechanical System (MEMS) microphones, a newly developed technology of very compact sensors, for multichannel speech enhancement. Experiments are conducted on real speech data collected using a MEMS microphone array. First, the effectiveness of the array geometry for noise suppression is explored, using a new corpus containing speech recorded...
A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.