The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In hands-free mobile communication, speech quality is often degraded due to presence of surrounding noise. This paper introduces an improved version of Minimum Mean Square Error (MMSE) noise estimator. Noise spectrum estimation is a crucial element used in speech recognition systems. Our proposed noise estimation method is based on a popular searching algorithm used in software engineering called...
This work proposes a technique for predicting the pitch from Mel-frequency cepstral coefficients (MFCC) vectors. Previous pitch prediction methods are based on the statistical models such as Gaussian mixture models and hidden Markov models. In this paper, we propose a three-step method to estimate pitch from MFCC vectors. First the Mel-filterbank energies (MFBEs) are estimated from MFCC vectors. Secondly,...
In this demonstration, we aim at presenting our recent implementation results and provide an evaluation testbed through which users can experiment and compare the outputs of the distributed speech enhancement algorithms in [1–3]. The system allows a user to assess the merits of these algorithms in any acoustic setup. The multi-channel Wiener filter (MWF) is a well-known noise reduction algorithm for...
The demonstration presents a real-time mockup of smartphone- based hearing aid with combined noise and acoustic feedback reduction. The designed reduction algorithm is based on spectral weighting approach which makes it very robust to rapid changes in feedback path either caused by displacement of the speaker/microphone or room acoustics. The aim of the demonstration is to show potential of the implemented...
We demonstrate the feasibility of the realtime implementa- tion of advanced binaural noise reduction algorithms in a single-chip computer called Raspberry Pi. The implementa- tion of the considered algorithms is realized in Simulink, a graphical programming add-on to the integrated development environment Matlab. Using a complementary support pack- age for Simulink, the Raspberry Pi is connected/hosted...
Speaker localization using microphone arrays is typically based on the expected phase and amplitude differences between microphones as a function of the wave arrival direction. However, in rooms with significant reverberation, the direct sound is contaminated by reflections and localization often fails. Recently, a reverberation-robust localization method was proposed, which uses only the direct-path...
The fundamental frequency is one of the prosodic parameters, and many algorithms have been developed for estimating the fundamental frequency of speech signals. Most of them provide good results on good quality speech signals, but their performance degrades when dealing with noisy signals. Moreover, although some provide a probability for the voicing decision, none of them indicate how reliable the...
Forensic Voice Comparison (FVC) is increasingly using the likelihood ratio (LR) in order to indicate whether the evidence supports the prosecution (same-speaker) or defender (different-speakers) hypotheses. Nevertheless, the LR accepts some practical limitations due both to its estimation process itself and to a lack of knowledge about the reliability of this (practical) estimation process. It is...
Pitch is an important characteristic of speech and is useful for many applications. However, it is still challenging to estimate pitch in strong noise. In this paper, we propose a joint training approach to determinate pitch. First, a Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTMRNN) is trained to map the noisy to clean speech features. Second, the pitch estimation is also...
Reverberation and noise are known to be the two most important culprits for poor performance in far-field speech applications, such as automatic speech recognition. Recent research has suggested that reverberation-aware speech enhancement (or speech technologies, in general) could be used to improve performance. However, recent results also show existing blind room acoustics characterization algorithms...
Classic approaches to multi-channel signal enhancement rely on model assumptions regarding speech source relative transfer functions and noise covariance matrix, or on estimates thereof obtained in, e.g., speech pauses. To alleviate these constraints, we here investigate an approach to adaptive estimation of the speech (target) source and noise related acoustic parameters based on localized speech...
Here we propose online adaptive beamforming for automatic speech recognition (ASR) in meetings in noisy, reverberant environments. The proposed method is based on recently developed mask-based beamforming, in which accurate mask estimation and diarization are paramount. Real-world experiments have shown that mask-based beamforming enables accurate ASR in meetings in small noise and reverberation with...
The Wiener filter is a well-known signal processing method for improving a noisy signal's quality. The Wiener filter requires either knowledge of or estimates of the power spectra of the signal-of-interest and of the undesired noise, leading to implementation challenges. In this paper, we show how a recently-developed second-order signal quantity termed the panorama can be employed to compute the...
Traditional speech separation systems enhance the magnitude response of noisy speech. Recent studies, however, have shown that perceptual speech quality is significantly improved when magnitude and phase are both enhanced. These studies, however, have not determined if phase enhancement is beneficial in environments that contain reverberation as well as noise. In this paper, we present an approach...
Speech recognition performance deteriorates in face of unknown noise. Speech enhancement offers a solution by reducing the noise in speech at runtime. However, it also introduces artificial distortions to the speech signals. In this paper, we aim at reducing the artifacts that has adverse effects on speech recognition. With this motivation, we propose a modification scheme including smoothing adaptation...
Dysarthria is a motor speech impairment, often characterized by speech that is generally indiscernible by human listeners. Assessment of the severity level of dysarthria provides an understanding of the patient's progression in the underlying cause and is essential for planning therapy, as well as improving automatic dysarthric speech recognition. In this paper, we propose a non-linguistic manner...
Noise reduction technologies have been applied to enhance the intelligibility of voice communications. However, existing methods are vulnerable to complex non-stationary noisy conditions, which are commonly encountered in real world hands-free scenarios. Additionally, the existing methods do not fully take the advantage of the deployment of multi-channel microphone arrays on the burgeoning high-end...
Systems based on i-vectors represent the current state-of-the-art in text-independent speaker recognition. In this work we introduce a new compact representation of a speech segment, similar to the speaker factors of Joint Factor Analysis (JFA) and to i-vectors, that we call “e-vector”. The e-vectors derive their name from the eigenvoice space of the JFA speaker modeling approach. Our working hypothesis...
In the era of deep learning, although beam-forming multi-channel signal processing is still very helpful, it was reported that single-channel robust front-ends usually cannot benefit deep learning models because the layer-by-layer structure of deep learning models provides a feature extraction strategy that automatically derives powerful noise-resistant features from primitive raw data for senone...
Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.