Search results

chapter

Performance analysis of several pitch detection algorithms on simulated and real noisy speech data

Denis Jouvet, Yves Laprie

2017 25th European Signal Processing Conference (EUSIPCO) > 1614 - 1618

2017 25th European Signal Processing Conference (EUSIPCO)

This paper analyses the performance of a large bunch of pitch detection algorithms on clean and noisy speech data. Two sets of noisy speech data are considered. One corresponds to simulated noisy data, and is obtained by adding several types of noise signals at various levels on the clean speech data of the Pitch-Tracking Database from Graz University of Technology (PTDB-TUG). The second one, SPEECON,...

chapter

Towards confidence measures on fundamental frequency estimations

Boyuan Deng, Denis Jouvet, Yves Laprie, Ingmar Steiner, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5605 - 5609

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The fundamental frequency is one of the prosodic parameters, and many algorithms have been developed for estimating the fundamental frequency of speech signals. Most of them provide good results on good quality speech signals, but their performance degrades when dealing with noisy signals. Moreover, although some provide a probability for the voicing decision, none of them indicate how reliable the...

chapter

Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures

Xiang Kong, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5810 - 5814

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper describes methods for evaluating automatic speech recognition (ASR) systems in comparison with human perception results, using measures derived from linguistic distinctive features. Error patterns in terms of manner, place and voicing are presented, along with an examination of confusion matrices via a distinctive-feature-distance metric. These evaluation methods contrast with conventional...

chapter

Ensemble based speaker verification using adapted score fusion in noisy reverberant environments

Ryosuke Nakanishi, Sayaka Shiota, Hitoshi Kiya

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper proposes an ensemble based automatic speaker recognition (ASV) using adapted score fusion in noisy reverberant environment. It is well known that background noise and reverberation affect the performance of the ASV systems. Various techniques have been reported to improve the robustness against noise and reverberation, and an ensemble based method is one of the effective techniques in the...

chapter

Evaluation of noise estimation algorithms based on minimum statistics and signal to noise ratio

Niksa Jakovljevic, Dragisa Miskovic, Zeljen Trpovski

2016 24th Telecommunications Forum (TELFOR) > 1 - 4

2016 24th Telecommunications Forum (TELFOR)

The paper reports on the objective evaluation and comparison of the two noise estimation algorithms for noisy speech signals. Both algorithms are based on observation that local minima in noisy speech spectrogram are close to the power level of the noise signal. The first algorithm directly searches spectrogram for the local minima and those values use to update noise power spectrum density (psd)...

chapter

Effect of importance sampling on robust segmentation of audio-cough events in noisy environments

Jesus Monge-Alvarez, Carlos Hoyos-Barcelo, Paul Lesso, Javier Escudero, more

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 3740 - 3744

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

This paper proposes a new cough detection system based on audio signals acquired from conventional smartphones. The system relies on local Hu moments to characterize cough events and a Λ-NN classifier to distinguish cough events from non-cough ones (speech, laugh, sneeze, etc.) and noisy sounds. To deal with the unbalance between classes, we employ Distinct-Borderline2 Synthetic Minority Oversampling...

chapter

Effect of multi-condition training and speech enhancement methods on spoofing detection

Hong Yu, Achintya Sarkar, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, more

2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE) > 1 - 5

2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE)

Many researchers have demonstrated the good performance of spoofing detection systems under clean training and testing conditions. However, it is well known that the performance of speaker and speech recognition systems significantly degrades in noisy conditions. Therefore, it is of great interest to investigate the effect of noise on the performance of spoofing detection systems. In this paper, we...

chapter

Significance of frame size and frame shift on vowel on set point detection

Nirupam Shome, Saharul Alom Barlaskar, R H Laskar

2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) > 1272 - 1276

2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT)

In this paper, we analyse the effect of frame size and frame shift in detection of vowel on set point (VOP) under clean and noisy conditions, towards making VOP detection more accurate in practical scenario. For detection of VOP we use the state of art technique which combines the complementary evidences from excitation source, spectral peaks, and modulation spectrum. We carry out our experiments...

chapter

Two-step noise reduction based on soft mask for robust speaker identification

Gennadiy Tupitsin, Artem Topnikov, Andrey Priorov

2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT) > 351 - 356

2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT)

This paper addresses the problem of speaker identification in noisy conditions. A two-step noise reduction algorithm based on soft mask and minimum mean square error short-time spectral amplitude estimator was proposed. It is used in the signal preprocessing stage for more robust speaker identification. The proposed algorithm was tested and compared with the existing noise reduction algorithms in...

chapter

Binaural wind noise detection, cancellation and its evaluation for hearing aids based on HRTF cues

Hidetoshi Nakashima, Ryousuke Kouyama, Nobuhiko Hiruma, Yoh-ichi Fujisaka

IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society > 4896 - 4899

IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society

Wind noise is one of the most significant issues for hearing aid users. In this paper, a contribution to this issue is made by using binaural phase and level difference. Most of sounds including speech signal have a directional information, that is, interaural phase difference (IPD) and level difference (ILD) are not varied if sound direction is fixed. However, wind noise have no directional information,...

chapter

Development of a transformation algorithm for emotional speech signal using DWT and Adaptive Filter for a Voice Culture Training System

Bageshree Sathe-Pathak, Ashish Panat

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 5

TENCON 2015 - 2015 IEEE Region 10 Conference

This paper develops an algorithm “Discrete Wavelet Transform with Adaptive Filter” (DWTAF) to transform Neutral speech into emotional speech like Angry, Happy or Sad and this is compared with two other emotion transformation algorithms. The other two algorithms are “Speech Transformation using Statistical Parameters and Pitch Contours” (STSPPC) and “Speech Transformation using Mel Frequency Cepstral...

chapter

An open/free database and Benchmark for Uyghur speaker recognition

Askar Rozi, Dong Wang, Zhiyong Zhang, Thomas Fang Zheng

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 81 - 85

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Few research has been conducted on Uyghur speaker recognition. Among the limited works, researchers usually collect small speech databases and publish results based on their own private data. This ‘close-door evaluation’ makes most of the publications doubtable. This paper publishes an open and free speech database THUYG-20 SRE and a benchmark for Uyghur speaker recognition. The database is based...

chapter

TCD-VoIP, a research database of degraded speech for assessing quality in VoIP applications

Naomi Harte, Eoin Gillen, Andrew Hines

2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX) > 1 - 6

2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX)

There are many types of degradation which can occur in Voice over IP calls. Degradations which occur independently of the codec, hardware, or network in use are the focus of this paper. The development of new quality metrics for modern communication systems depends heavily on the availability of suitable test and development data with subjective quality scores. A new dataset of VoIP degradations (TCD-VoIP)...

article

Single Frequency Filtering Approach for Discriminating Speech and Nonspeech

G. Aneeja, B. Yegnanarayana

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2015 > 23 > 4 > 705 - 717

In this paper, a signal processing approach is proposed for speech/nonspeech discrimination. The approach is based on single frequency filtering (SFF), where the amplitude envelope of the signal is obtained at each frequency with high temporal and spectral resolution. This high resolution property helps to exploit the resulting high signal-to-noise ratio (SNR) regions in time and frequency. The variance...

chapter

Noise robust estimation of the voice source using a deep neural network

Manu Airaksinen, Tuomo Raitio, Paavo Alku

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5137 - 5141

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In the analysis of speech production, information about the voice source can be obtained non-invasively with glottal inverse filtering (GIF) methods. Current state-of-the-art GIF methods are capable of producing high-quality estimates in suitable conditions (e.g. low noise and reverberation), but their performance deteriorates in nonideal conditions because they require noise-sensitive parameter estimation...

chapter

A new approach to dereverberation and noise reduction with microphone arrays

J. L. Sanchez-Bote, J. Gonzalez-Rodriguez, J. Ortega-Garcia

2000 10th European Signal Processing Conference > 1 - 4

2000 10th European Signal Processing Conference

In this paper the speech enhancement abilities of a new array-based processor have been tested. The proposed system works in three cascade stages. First, the signals are time aligned with the estimated direction of the desired sound source. Second, the signal is decomposed in its allpass and minimum-phase components using cepstral processing. In this moment, beamforming and liftering in cepstral domain...

chapter

Comparison of subjective and objective speech quality assessment for different degradation / noise conditions

Rajesh Kumar Dubey, Arun Kumar

2015 International Conference on Signal Processing and Communication (ICSC) > 261 - 266

2015 International Conference on Signal Processing and Communication (ICSC)

Objective speech quality assessment is done to replace the time taking and cumbersome subjective listening test to assess the quality of degraded speech processed by different speech processing algorithms. For performance evaluation, all objective speech quality assessment algorithms require the Mean Opinion Score-Listening Quality Subjective (MOS-LQS) or subjective MOS obtained from the subjective...

chapter

Robustness of forensic speaker verification systems based on Alize/Lia_Ral toolkit

Francesco Bellomo, Francesco Beritelli, Elisabetta Sciacca

2014 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS) Proceedings > 92 - 97

2014 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS)

This paper presents the performance analysis of Alize/Lia_Ral algorithms in forensic speaker verification applications. In particular, in this work we evaluate the performance impact of speech signal degradation considering the background noise level, speech rate variation, audio signal length used for testing, GSM radio channel, etc. The Alize/Lia_Ral platform has demonstrated a strong dependence...

chapter

Release of masking and FAME performance evaluation to improve speech intelligibility on cochlear implant

Sena Sukmananda Suprapto, Dhany Arifianto, Sekartedjo

2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE) > 1 - 6

2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE)

Cochlear implant (CI) is a hearing aid for people with profound deafness, inability to respond to a sound stimulus above 90dB SPL. The main problem of CI user is the inability to discriminate simultaneous incoming sounds, focusing on the desired sound (target) whilst ignoring the rest (cocktail party problem). In this research, the release of masking strategy is introduced to give a glimpse of acoustical...

chapter

Formant based linear prediction coefficients for speaker identification

Sumit Srivastava, Pratibha Nandi, G. Sahoo, Mahesh Chandra

2014 International Conference on Signal Processing and Integrated Networks (SPIN) > 685 - 688

2014 International Conference on Signal Processing and Integrated Networks (SPIN)

Here Formant Based Linear Prediction Coefficient (FBLPC) features are proposed for speaker identification for all environments. Gaussian Mixture Models (GMMs) are used for classification of speakers. The identification performance of Linear Prediction Coefficient (LPC) features is computed and compared with the identification performance of FBLPC features. The performance of FBLPC features is found...

INFONA - science communication portal

Search results

Performance analysis of several pitch detection algorithms on simulated and real noisy speech data

Towards confidence measures on fundamental frequency estimations

Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures

Ensemble based speaker verification using adapted score fusion in noisy reverberant environments

Evaluation of noise estimation algorithms based on minimum statistics and signal to noise ratio

Effect of importance sampling on robust segmentation of audio-cough events in noisy environments

Effect of multi-condition training and speech enhancement methods on spoofing detection

Significance of frame size and frame shift on vowel on set point detection

Two-step noise reduction based on soft mask for robust speaker identification

Binaural wind noise detection, cancellation and its evaluation for hearing aids based on HRTF cues

Development of a transformation algorithm for emotional speech signal using DWT and Adaptive Filter for a Voice Culture Training System

An open/free database and Benchmark for Uyghur speaker recognition

TCD-VoIP, a research database of degraded speech for assessing quality in VoIP applications

Single Frequency Filtering Approach for Discriminating Speech and Nonspeech

Noise robust estimation of the voice source using a deep neural network

A new approach to dereverberation and noise reduction with microphone arrays

Comparison of subjective and objective speech quality assessment for different degradation / noise conditions

Robustness of forensic speaker verification systems based on Alize/Lia_Ral toolkit

Release of masking and FAME performance evaluation to improve speech intelligibility on cochlear implant

Formant based linear prediction coefficients for speaker identification

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options