The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper analyses the performance of a large bunch of pitch detection algorithms on clean and noisy speech data. Two sets of noisy speech data are considered. One corresponds to simulated noisy data, and is obtained by adding several types of noise signals at various levels on the clean speech data of the Pitch-Tracking Database from Graz University of Technology (PTDB-TUG). The second one, SPEECON,...
The fundamental frequency is one of the prosodic parameters, and many algorithms have been developed for estimating the fundamental frequency of speech signals. Most of them provide good results on good quality speech signals, but their performance degrades when dealing with noisy signals. Moreover, although some provide a probability for the voicing decision, none of them indicate how reliable the...
This paper describes methods for evaluating automatic speech recognition (ASR) systems in comparison with human perception results, using measures derived from linguistic distinctive features. Error patterns in terms of manner, place and voicing are presented, along with an examination of confusion matrices via a distinctive-feature-distance metric. These evaluation methods contrast with conventional...
This paper proposes an ensemble based automatic speaker recognition (ASV) using adapted score fusion in noisy reverberant environment. It is well known that background noise and reverberation affect the performance of the ASV systems. Various techniques have been reported to improve the robustness against noise and reverberation, and an ensemble based method is one of the effective techniques in the...
The paper reports on the objective evaluation and comparison of the two noise estimation algorithms for noisy speech signals. Both algorithms are based on observation that local minima in noisy speech spectrogram are close to the power level of the noise signal. The first algorithm directly searches spectrogram for the local minima and those values use to update noise power spectrum density (psd)...
This paper proposes a new cough detection system based on audio signals acquired from conventional smartphones. The system relies on local Hu moments to characterize cough events and a Λ-NN classifier to distinguish cough events from non-cough ones (speech, laugh, sneeze, etc.) and noisy sounds. To deal with the unbalance between classes, we employ Distinct-Borderline2 Synthetic Minority Oversampling...
Many researchers have demonstrated the good performance of spoofing detection systems under clean training and testing conditions. However, it is well known that the performance of speaker and speech recognition systems significantly degrades in noisy conditions. Therefore, it is of great interest to investigate the effect of noise on the performance of spoofing detection systems. In this paper, we...
In this paper, we analyse the effect of frame size and frame shift in detection of vowel on set point (VOP) under clean and noisy conditions, towards making VOP detection more accurate in practical scenario. For detection of VOP we use the state of art technique which combines the complementary evidences from excitation source, spectral peaks, and modulation spectrum. We carry out our experiments...
This paper addresses the problem of speaker identification in noisy conditions. A two-step noise reduction algorithm based on soft mask and minimum mean square error short-time spectral amplitude estimator was proposed. It is used in the signal preprocessing stage for more robust speaker identification. The proposed algorithm was tested and compared with the existing noise reduction algorithms in...
Wind noise is one of the most significant issues for hearing aid users. In this paper, a contribution to this issue is made by using binaural phase and level difference. Most of sounds including speech signal have a directional information, that is, interaural phase difference (IPD) and level difference (ILD) are not varied if sound direction is fixed. However, wind noise have no directional information,...
This paper develops an algorithm “Discrete Wavelet Transform with Adaptive Filter” (DWTAF) to transform Neutral speech into emotional speech like Angry, Happy or Sad and this is compared with two other emotion transformation algorithms. The other two algorithms are “Speech Transformation using Statistical Parameters and Pitch Contours” (STSPPC) and “Speech Transformation using Mel Frequency Cepstral...
Few research has been conducted on Uyghur speaker recognition. Among the limited works, researchers usually collect small speech databases and publish results based on their own private data. This ‘close-door evaluation’ makes most of the publications doubtable. This paper publishes an open and free speech database THUYG-20 SRE and a benchmark for Uyghur speaker recognition. The database is based...
There are many types of degradation which can occur in Voice over IP calls. Degradations which occur independently of the codec, hardware, or network in use are the focus of this paper. The development of new quality metrics for modern communication systems depends heavily on the availability of suitable test and development data with subjective quality scores. A new dataset of VoIP degradations (TCD-VoIP)...
In this paper, a signal processing approach is proposed for speech/nonspeech discrimination. The approach is based on single frequency filtering (SFF), where the amplitude envelope of the signal is obtained at each frequency with high temporal and spectral resolution. This high resolution property helps to exploit the resulting high signal-to-noise ratio (SNR) regions in time and frequency. The variance...
In the analysis of speech production, information about the voice source can be obtained non-invasively with glottal inverse filtering (GIF) methods. Current state-of-the-art GIF methods are capable of producing high-quality estimates in suitable conditions (e.g. low noise and reverberation), but their performance deteriorates in nonideal conditions because they require noise-sensitive parameter estimation...
In this paper the speech enhancement abilities of a new array-based processor have been tested. The proposed system works in three cascade stages. First, the signals are time aligned with the estimated direction of the desired sound source. Second, the signal is decomposed in its allpass and minimum-phase components using cepstral processing. In this moment, beamforming and liftering in cepstral domain...
Objective speech quality assessment is done to replace the time taking and cumbersome subjective listening test to assess the quality of degraded speech processed by different speech processing algorithms. For performance evaluation, all objective speech quality assessment algorithms require the Mean Opinion Score-Listening Quality Subjective (MOS-LQS) or subjective MOS obtained from the subjective...
This paper presents the performance analysis of Alize/Lia_Ral algorithms in forensic speaker verification applications. In particular, in this work we evaluate the performance impact of speech signal degradation considering the background noise level, speech rate variation, audio signal length used for testing, GSM radio channel, etc. The Alize/Lia_Ral platform has demonstrated a strong dependence...
Cochlear implant (CI) is a hearing aid for people with profound deafness, inability to respond to a sound stimulus above 90dB SPL. The main problem of CI user is the inability to discriminate simultaneous incoming sounds, focusing on the desired sound (target) whilst ignoring the rest (cocktail party problem). In this research, the release of masking strategy is introduced to give a glimpse of acoustical...
Here Formant Based Linear Prediction Coefficient (FBLPC) features are proposed for speaker identification for all environments. Gaussian Mixture Models (GMMs) are used for classification of speakers. The identification performance of Linear Prediction Coefficient (LPC) features is computed and compared with the identification performance of FBLPC features. The performance of FBLPC features is found...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.