Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating...
Accurate pitch extraction from speech is important but challenging problem for speech synthesis. However, the additive nature and long-term suprasegmental property of pitch features have not been fully exploited in most of the existing pitch estimators as they are operated frame by frame. As a result, they would cause some inherent discontinuities, such as double/half F0 errors and unvoiced/voiced...
Sound source localization techniques are becoming popular as they provide an effective information for parameter coding and reconstruction of sound scene. A recent approach based on “single-source” zone detecting was proposed. However, the method is not robust in noisy environment due to its DOA estimation principle. To overcome this issue, a mixture enhancement processing based multiple sound source...
Autoregressive model order and parameter estimation technique is proposed and applied for modeling of Lithuanian semivowels. According to experimental results adequate modeling of semivowels requires for model order 72 in average. The appropriate order value differs for female and male voices. Besides, there are remarkable differencies between word starting and middle phones - the last ones are influenced...
The paper reports on the objective evaluation and comparison of the two noise estimation algorithms for noisy speech signals. Both algorithms are based on observation that local minima in noisy speech spectrogram are close to the power level of the noise signal. The first algorithm directly searches spectrogram for the local minima and those values use to update noise power spectrum density (psd)...
In this paper, a two-layer Gaussian Mixed Model (GMM) structure for Vector Taylor Series (VTS) feature compensation is proposed for robust speech recognition. Since GMM with the numerous mixture components is used for VTS, the computation complexity of VTS is extremely huge. To deal with this issue, we propose two-layer GMM structure for VTS. In detail, the GMM with fewer mixture components is utilized...
This paper focuses on the quality of speech coding parameters extraction under noisy and clean conditions. The influence of speech enhancement on the quality of extracted parameters for a low bit rate speech coder is addressed. MELP vocoder is used to estimate three parameters: the fundamental frequency, voicing and linear prediction coefficients. De-noising methods in MELPe vocoder and SMV are adopted...
The term “Quality of Speech” in Speech Enhancement techniques is associated with Clarity and Intelligibility. Till now due to the variable nature and characteristics of noise with time and process to process, Speech Enhancement is a difficult problem in Noisy environment. In this paper, we proposed a method to improve the quality of speech based on combination of Digital Audio Effects with Improved...
In natural environment speech signal is affected by various acoustic interference. Many of the applications in audio signal processing such as automatic speech recognition, telecommunications and hearing aid applications etc. requires an effective way of segregating the target speech from the mixed speech. Pitch information has an important role in the field of audio signal processing, especially...
For wireless remote access security, forensics, electronic commerce and surveillance applications, there is a growing need for biometric speaker identification systems to be robust to noise. This paper examines the robustness issue for the case of additive white noise at signal to noise ratios ranging from 0 to 30 dB. A Gaussian mixture model classifier based on adaptation of a universal background...
Fundamental frequency (F0) estimation plays an important role in speech processing such as speech coding, synthesis, recognition and so on. Although a present F0 estimation method performs well under clean condition, the performance deteriorates significantly in noisy environment. For this reason robust F0 estimation against additive noise is demanded. We have previously proposed F0 estimation methods...
Noise corruption can dramatically decrease the speech intelligibility for listeners with cochlear implants (CI). Noise reduction is a key point in CI speech processing strategy. This paper proposes a statistical model based noise reduction algorithm for-CIs. A realistic noise estimator, which requires no prior knowledge of the noise, is adopted for noise estimation. An improved method for determining...
A new speech enhancement strategy is proposed by utilizing a Bayesian nonparametric method of beta process factor analysis. As a sparse representation frame work, the dictionary learning, sparse coefficients representation and noise variance estimation are integrated into a joint procedure of Bayesian posterior estimation. The beta process is adopted as a sparse prior to infer the sparsity of the...
Estimating the amplitude spectral of noise signal is a very important part in many noise reduction systems. The conventional voice activity detection (VAD)-based method updates the amplitude spectral estimate only in speech absence areas and fails to deal with non-stationary noise. To overcome this problem, this paper proposes two methods to estimate the noise amplitude spectral for non-stationary...
The article presents the results of signal analysis of the recorded singing voice samples. For that study the recorded samples of the “a-e-i-o-u” exercise is analysed. Some significant parameters describing voice have been estimated. Among the estimated parameters are: pitch, calculated with the use of autocorrelation method, values of the first five harmonics, set of parameters containing first five...
The Minimum Variance Distortionless Response (MVDR) beam-former is a popular multi-microphone noise reduction and speech enhancement strategy that can be implemented either as a fixed-constraint MVDR beamformer, with a pre-defined Relative Transfer Function (RTF) or based on a Multi-channel Wiener Filter (MWF) estimate. However, each implementation is not fully robust within a dynamic acoustic environment...
This paper introduces novel two-channel a priori Signal-to-Noise Ratio (SNR) estimators for use in frequency-domain speech enhancement algorithms. The SNR estimation is based on statistics of the noisy phase difference between two microphones in each frequency bin. Namely, the corresponding probability distribution is derived assuming a complex Gaussian model, and is written in terms of the SNR only...
Speaker diarization is the task of estimating “who spoke when” in a meeting. To realize accurate diarization for real meetings, we have to deal with noise, speaker overlap, reverberation, etc. In this work, we propose to model directional statistics of spatial clusters via a dictionary of probabilistic models. The dictionary is trained using spatial features of possible source locations. Observed...
Many artificial speech bandwidth extension (ABE) approaches perform source-filter decomposition of the input narrowband speech, with subsequent computation of upper frequency band (UB) spectral envelope posteriors. In this paper we perform a direct comparison of HMM- and deep neural network (DNN)-based modeling of likelihoods or posteriors for ABE UB envelope estimation. DNN-based approaches turn...
Noise energy estimation is widely used as a pre-process in speech enhancement and speech recognition systems. While many signal processing algorithms have been proposed to estimate the additive noise energy, they are generally based on some statistical hypothesis and have high computation complexity, which is crucial in mobile devices. When the hypothesis does not hold, the estimation performance...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.