The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper is dedicated to the memory of Steven L. Grant for his exceptional contributions to the echo cancellation problem. The regularization is mandatory in all ill-posed problems, especially in the presence of additive noise. In this paper, we consider the regularized recursive least-squares (RLS) algorithm and present a method to find its regularization parameter, depending on the signal-to-noise...
Extended Kalman filter frequency tracker converges to an optimal estimation with a very low computational complexity only if an exact knowledge about the initial state is available and the model is accurate otherwise it may diverge. On the other hand, particle filter performance is not so sensitive to the initial state estimation and results in a more robust estimation even if no prior knowledge about...
This paper describes a high quality voice activity detection using wavelet energy entropy. In this algorithm, the partitioning of the speech band into sub-bands is performed via a bank of the adaptive band-partitioning filters whose coefficients are derived from a wavelet tree structure. The adaptive band-partitioning models have been proposed to perform endpoint detections of isolated digit utterances...
Within the framework of computational auditory scene analysis (CASA), a parameter masks estimator based on deep neural networks (DNN) is proposed for automatic speech recognition (ASR) in noisy environments. This paper addresses the robustness in binaural machine speech recognition by speech energy estimation using DNN. An ideal parameter mask (IPM) is introduced as the goal of the DNN estimator,...
In this paper, a new combination of features and normalization methods is investigated for robust biometric speaker identification. Mel Frequency Cepstral Coefficients (MFCC) are efficient for speaker identification in clean speech while Power Normalized Cepstral Coefficients (PNCC) features are robust for noisy environments. Therefore, combining both features together is better than taking each one...
This work analyzes excitation source to characterize glottal stops using integrated linear prediction (ILP) residual, derived by pitch-synchronous (PS) approach. The glottal stop consonant is produced due to laryngeal gesture in the form of constricted glottis. This pressed glottal configuration, leads to period to period irregularities, aperiodicity, and asymmetry. Normalized crosscorrelation coefficient...
In this paper, we have proposed a new technique for voice activity detection (VAD) using lacunarity index combined with empirical mode decomposition (EMD) technique. In the preprocessing stage of the proposed framework, the noisy speech signal is decomposed into several intrinsic mode functions (IMFs) based on EMD technique. After that more informative IMFs are selected using spectral flatness measurement...
In this paper we address the transformation of whispered speech into natural voiced speech. Representative state-of-the-art solutions are first reviewed as well as a baseline algorithm. For the most part, these solutions fall in the realm of voice conversion strategies since the output signal is obtained as a projection of an input signal. In this paper, we propose a different approach that addresses...
In this paper, combination of statistical model based approach and Non-negative matrix factorization (NMF) based approach with on-line update of speech and noise bases for speech enhancement is proposed. Template based approaches are more robust and performs better to non-stationary noises compared to the statistical model based approaches. However, the template based approach is dependent on a priori...
A voice activity detection in mobile environments is not performed well due to arbitrary noises. In this paper, a robust voice activity detection framework for mobile devices is proposed. The unsupervised clustering and discriminative weight training of each cluster is employed to model various characteristics of arbitrary noises.
The Kalman filter has a wide range of applications, noise removal from corrupted speech being one of them. The filter performance is subject to the accurate tuning of its parameters, namely the process noise covariance, Q, and the measurement noise covariance, R. In this paper, the Kalman filter has been tuned to get a suitable value of Q by defining the robustness and sensitivity metrics, and then...
Protecting the copyrights of multimedia content is necessary to discourage unauthorized distribution or sharing of the content over Internet. Digital watermarking helps in protecting the copyrights while reduces the monetary loss to the content owner. Digital watermarking is the process of inserting owner or customer related unique information as a watermark into a host signal such as audio or speech...
General voice based access control systems are based on voice biometrics. This process enables an unauthorized access by recording the voice of the authorized person. So there is a requirement to prevent unauthorized access through recording speech. Other than voice biometrics, here we have two challenges. (i) To extract the authentication information. (ii) To find the unauthorized source. The speech...
This paper discusses the voice and audio quality characteristics of EVS, the recently standardized 3GPP codec. Especially frame erasure conditions were evaluated. Comparison to industry standard voice codecs: 3GPP AMR and AMR-WB as well as direct signals at varying bandwidths was made. Speech quality was evaluated with two subjective listening tests containing clean and noisy speech in Finnish language...
As cellular-IoT has been one of the key driving forces to 5G, spectrally efficient support for heterogeneous services that have quite different requirements is becoming ever so important, and OFDM and its variant SC-FDMA to support all 5G service scenarios is being questioned. The new waveforms (alternatives to OFDM) are mainly driven by low latency access, fragmented spectrum utilization, relaxed...
Perceptual audio hashing, which summarizes big audio data into compact and robust digest, provides a useful tool for the identification, retrieval, authentication, and other information processing of multimedia contents. This paper proposes a perceptual audio hashing algorithm based on Radon transform in wavelet domain. The wavelet approximate coefficients are mapped into a two dimensional matrix...
Voice activity detection (VAD) is an imperative technique in many speech applications. An efficient and accurate VAD algorithm that is robust to background noise is proposed in this paper. By calculating permutation entropy (PE), the method can not only determine the presence or absence of speech, but also distinguish voiced and unvoiced parts of speech. Experiments under several noise cases have...
The advanced front-end (AFE) for automatic speech recognition (ASR) was standardized by the European Telecommunications Standards Institute (ETSI). The AFE provides speech enhancement realized by an iterative Wiener filter (IWF) in which a smoothed FFT spectrum over adjacent frames is used to design the filter. We have previously proposed robust time-varying complex AR (TV-CAR) speech analysis and...
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining automatic asynchronous speech (microphone or mobile terminal) selection and environmental adaptation with deep neural network based framework. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus...
Speech Recognition has proven its significance in various online and offline applications including the authentication systems, translators, voice commander, etc. But as the voice is captured from some instrument it suffers from various impurities because of technical faults and environmental disturbance. These all noise criticalities degrade the accuracy of speech recognition methods. In this paper,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.