The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Speech intelligibility is an important aspect of speech transmission but often when speech coding standards are compared only the quality is evaluated using perceptual tests. In this study, the performance of three wideband speech coding standards, adaptive multi-rate wideband (AMR-WB), G.718, and enhanced voice services (EVS), is evaluated in a subjective intelligibility test. The test covers different...
This paper presents a multichannel dereverberation algorithm that only uses coherent acoustic channels. In the framework of multi-input/output inverse theorem (MINT), the equalization performance varies depending on the length of the input acoustic channels. However, only the portion of observed channel that resemble the true acoustic channel contributes to performance enhancement when measurement...
This paper addresses the problem of relative transfer function (RTF) estimation in the presence of stationary noise. We propose an RTF identification method based on segmental power spectral density (PSD) matrix subtraction. First multiple channel microphone signals are divided into segments corresponding to speech-plus-noise activity and noise-only. Then, the subtraction of two segmental PSD matrices...
We propose a novel sparse representation for heavily underdetermined multichannel sound mixtures, i.e., with much more sources than microphones. The proposed approach operates in the complex Fourier domain, thus preserving spatial characteristics carried by phase differences. We derive a generalization of K-SVD which jointly estimates a dictionary capturing both spectral and spatial features, a sparse...
Pitch information is an important cue for speech separation. However, pitch estimation in noisy condition is also a task as challenging as speech separation. In this paper, we propose a supervised learning architecture which combines these two problems concisely. The proposed algorithm is based on deep stacking network (DSN) which provides a method of stacking simple processing modules in building...
To approximate the speech quality of a given speech enhancement system, most of the existing instrumental metrics rely on the calculation of a distortion metric defined between the clean reference signal and the enhanced signal in the spectral amplitude domain. Several recent studies have demonstrated the effectiveness of employing a phase modification stage in single-channel speech enhancement showing...
Repetition is a fundamental element in generating and perceiving structure in audio. Especially in music, structures tend to be composed of patterns that repeat through time (e.g., rhythmic elements in a musical accompaniment), and also frequency (e.g., different notes of the same instrument). The auditory system has the remarkable ability to parse such patterns by identifying repetitions within the...
In this paper we consider an acoustic scenario with a desired source and a directional interference picked up by hearing devices in a noisy and reverberant environment. We present an extension of the binaural multichannel Wiener filter (BMWF), by adding an interference rejection constraint to its cost function, in order to combine the advantages of spatial and spectral filtering while mitigating directional...
Model-based single-channel source separation (SCSS) is an ill-posed problem requiring source-specific prior knowledge. In this paper, we use representation learning and compare general stochastic networks (GSNs), Gauss Bernoulli restricted Boltzmann machines (GBRBMs), conditional Gauss Bernoulli restricted Boltzmann machines (CGBRBMs), and higher order contractive autoencoders (HCAEs) for modeling...
Traditional sound event recognition methods based on informative front end features such as MFCC, with back end sequencing methods such as HMM, tend to perform poorly in the presence of interfering acoustic noise. Since noise corruption may be unavoidable in practical situations, it is important to develop more robust features and classifiers. Recent advances in this field use powerful machine learning...
In single-channel speech enhancement the spectral amplitude of the noisy signal is often modified while the noisy spectral phase is directly employed for signal reconstruction. Recently, additional improvement in speech enhancement performance has been reported when the noisy phase is modified. In this work, we propose a Bayesian estimator for phase of harmonics given the noisy speech. The proposed...
Despite recent advancements in digital signal processing technology for cochlear implant (CI) devices, there still remains a significant gap between speech identification performance of CI users in reverberation compared to that in anechoic quiet conditions. Alternatively, automatic speech recognition (ASR) systems have seen significant improvements in recent years resulting in robust speech recognition...
In this paper, a delayless speech enhancement scheme with zero phase distortion is proposed. It is based on a cascade of adaptive filters that predicts periodic components with a significant auto-correlation for lags larger than a value D. The adaptive filter is positioned at the output of a speech enhancement algorithm, to adjust the phase of the periodic components to the noisy signal, and to remove...
We propose a sparse hidden Markov model (HMM)-based single-channel speech enhancement method that models the speech and noise gains accurately in both stationary and nonstationary environments. The objective function is augmented with an lp regularization term resulting in a sparse autoregressive HMM (SARHMM). The method encourages sparsity in the speech- and noise- modeling, which eliminates the...
This paper describes new time domain techniques for concealing packet loss in the new 3GPP Enhanced Voice Services codec. Enhancements to the existing ACELP concealment methods include guided, improved pitch prediction, increased flexibility and accuracy of pulse resynchronization. Furthermore, the new method of separate linear predictive (LP) filter synthesis aims for sound quality improvement in...
This paper presents a method to enhance a speech signal disturbed by wind noise. The wind noise is generated by turbulences in an air stream close to the microphone which picks up the desired speech signal. As the majority of speech enhancement algorithms works in the frequency domain, the short term power spectrum (STPS) of the unwanted noise must be estimated to reduce the wind noise. Conventional...
The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature...
In industrial noise environments, the use of assistive listening headsets is a means to provide adequate access to voice communication while wearing hearing protection. This paper presents a performance evaluation and comparison of two different methods to provide the binaural speech enhancement in real industrial noise scenarios. The investigated binaural methods based on differential beamforming...
Recently, the ideal binary mask has been introduced in the modulation domain by extending the ideal channel selection method to modulation channel selection [1]. This new method shows substantial improvement in speech intelligibility but less than its predecessor despite the higher complexity. Here, we extend the previous finding from [1] and provide a more direct comparison of binary masking in the...
The ability of robots to listen to several things at once with their own “ears”, that is, robot audition, is an important factor in improving interaction and symbiosis between humans and robots. The critical issue in robot audition is real-time processing and robustness against noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper first...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.