Search results

Items from 1 to 20 out of 64 results

chapter

Thai speech recognition using Neuro-fuzzy system

Krittakom Srijiranon, Narissara Eiamkanitchat

2015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) > 1 - 6

2015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

There are many popular algorithms to recognize the human voice. The good algorithm not only results the high recognition accuracy, but also robust to noises. Several experiments are done in this research to verify the performance of the Neuro-fuzzy system to recognize the human voice. Eight words in Thai language recorded in a different environment, syllable and pronunciations are used as a data set...

chapter

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

Hynek Hermansky, Lukas Burget, Jordan Cohen, Emmanuel Dupoux, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5009 - 5013

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken...

chapter

Robust sound event recognition using convolutional neural networks

Haomin Zhang, Ian McLoughlin, Yan Song

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 559 - 563

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Traditional sound event recognition methods based on informative front end features such as MFCC, with back end sequencing methods such as HMM, tend to perform poorly in the presence of interfering acoustic noise. Since noise corruption may be unavoidable in practical situations, it is important to develop more robust features and classifiers. Recent advances in this field use powerful machine learning...

chapter

Weighted training for speech under Lombard Effect for speaker recognition

Muhammad Muneeb Saleem, Gang Liu, John H.L. Hansen

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4350 - 4354

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...

chapter

A simple but efficient real-time Voice Activity Detection algorithm

M. H. Moattar, M. M. Homayounpour

2009 17th European Signal Processing Conference > 2549 - 2553

2009 17th European Signal Processing Conference

Voice Activity Detection (VAD) is a very important front end processing in all Speech and Audio processing applications. The performance of most if not all speech/audio processing methods is crucially dependent on the performance of Voice Activity Detection. An ideal voice activity detector needs to be independent from application area and noise condition and have the least parameter tuning in real...

chapter

Neural response based phoneme classification under noisy condition

Md.Shariful Alam, Wissam A. Jassim, Muhammad S.A. Zilany

2014 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) > 175 - 179

2014 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)

Human listeners are capable of recognizing speech in noisy environment, while most of the traditional speech recognition methods do not perform well in the presence of noise. Unlike traditional Mel-frequency cepstral coefficient (MFCC)-based method, this study proposes a phoneme classification technique using the neural responses of a physiologically-based computational model of the auditory periphery...

chapter

Making a robot dance to diverse musical genre in noisy environments

Joao Lobato Oliveira, Keisuke Nakamura, Thibault Langlois, Fabien Gouyon, more

2014 IEEE/RSJ International Conference on Intelligent Robots and Systems > 1896 - 1901

2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014)

In this paper we address the problem of musical genre recognition for a dancing robot with embedded microphones capable of distinguishing the genre of a musical piece while moving in a real-world scenario. For this purpose, we assess and compare two state-of-the-art musical genre recognition systems, based on Support Vector Machines and Markov Models, in the context of different real-world acoustic...

chapter

On existence of optimal boundary value between early reflections and late reverberation

Arkadiy Prodeus, Olga Ladoshko

2014 IEEE 34th International Scientific Conference on Electronics and Nanotechnology (ELNANO) > 442 - 446

2014 IEEE 34th International Conference on Electronics and Nanotechnology (ELNANO)

Enhancement of speech distorted by reverberation is issue of the day. The problem has been actively studied in the last decade. However, it is still extremely difficult to find clear recommendations on choice of boundary value between early reflections and late reverberation, optimal in sense of such criteria as speech recognition accuracy and speech quality. Another problem is getting of simple pre-processor...

chapter

Emotion recognition from telephone speech using acoustic and nonlinear features

S. Bedoya-Jaramillo, J.R. Orozco-Arroyave, J.D. Arias-Londono, J.F. Vargas-Bonilla

2013 47th International Carnahan Conference on Security Technology (ICCST) > 1 - 5

2013 International Carnahan Conference on Security Technology (ICCST)

This paper addresses the problem of the automatic recognition of emotional states from speech recordings, especially those kind of emotions reflecting that the life or the human integrity are at risk. The paper compares the performance of two different systems: one being fed with speech signals recorded directly from the people (whole spectrum) and other one in which the speech signals are recorded...

chapter

Overlapped sub-band modulation spectrum normalization techniques for robust speech recognition

Hao-teng Fan, Wei-jeih Yeh, Jeih-weih Hung

2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) > 1035 - 1039

2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

This paper proposes a novel approach to enhance the speech features in noise robustness for speech recognition. In the proposed approach, the speech feature time sequence is first converted into the modulation spectral domain via discrete Fourier transform (DFT). The magnitude part of the modulation spectrum is decomposed into overlapped non-uniform sub-band segments, and then each sub-band segment...

chapter

An approach for formant based speech recognition in noise

Shaikh Anowarul Fattah, Tonmoy Ghosh, Apurba Kumar Das, Rajib Goswami, more

TENCON 2012 IEEE Region 10 Conference > 1 - 4

TENCON 2012 - 2012 IEEE Region 10 Conference

In this paper, a noise robust formant frequency estimation scheme is developed utilizing the advantageous properties of the autocorrelation function of the band-limited noisy speech signal. It is shown that the use of autocorrelation operation on a speech signal, which is band-limited to a particular formant zone, in comparison to one without any band limitation, can provide higher noise immunity,...

chapter

Post-processing of the recognized speech for web presentation of large audio archive

Marek Bohac, Karel Blavka, Michaela Kucharova, Svatava Skodova

2012 35th International Conference on Telecommunications and Signal Processing (TSP) > 441 - 445

2012 35th International Conference on Telecommunications and Signal Processing (TSP)

This paper deals with a post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive (containing hundreds of thousands of recordings). The ultimate goal of the project is to transcribe them and to allow public access to their content. In this paper we focus on methods and algorithms for unsupervised post-processing of automatically recognized recordings...

chapter

Fourier-Bessel cepstral coefficients for robust speech recognition

Chetana Prakash, Suryakanth V. Gangashetty

2012 International Conference on Signal Processing and Communications (SPCOM) > 1 - 5

2012 International Conference on Signal Processing and Communications (SPCOM)

In this paper we propose Fourier-Bessel cepstral coefficients (FBCC) features for robust speech recognition. The Fourier-Bessel representation of the speech signal is obtained using Bessel function as a basis set. The FBCC are extracted from zero^th order Bessel coefficients taking into account of the perceptual characteristics of human auditory system. Recognition accuracy is measured using the CMU...

chapter

Fine-tuning HMMS for nonverbal vocalizations in spontaneous speech: A multicorpus perspective

Dmytro Prylipko, Bjorn Schuller, Andreas Wendemuth

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4625 - 4628

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Phenomena like filled pauses, laughter, breathing, hesitation, etc. play significant role in everyday human-to-human conversation and have a significant influence on speech recognition accuracy [1]. Because of their nature (e. g. long duration), they should be modeled with different number of emitting states and Gaussian mixtures. In this paper we address this question and try to determine the most...

chapter

MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments

Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, Nobuaki Minematsu, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4109 - 4112

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational...

chapter

Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition

Chanwoo Kim, Richard M. Stern

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4101 - 4104

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper presents a new feature extraction algorithm called Power Normalized Cepstral Coefficients (PNCC) that is based on auditory processing. Major new features of PNCC processing include the use of a power-law nonlinearity that replaces the traditional log nonlinearity used in MFCC coefficients, a noise-suppression algorithm based on asymmetric filtering that suppress background excitation, and...

chapter

Easy does it: Robust spectro-temporal many-stream ASR without fine tuning streams

Suman V. Ravuri, Nelson Morgan

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4309 - 4312

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Previous work has shown that spectro-temporal features reduce the word error rate for automatic speech recognition under noisy conditions. These systems, however, required significant hand-tuning in order to determine which spectral and temporal modulations should be included in a particular stream. In this work, streams are split into one spectral and temporal modulation each and their posterior...

chapter

Combining eigenvoice speaker modeling and VTS-based environment compensation for robust speech recognition

Zhijian Ou, Kan Deng

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4673 - 4676

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Eigenvoice and vector Taylor series (VTS) are good models for speaker differences and environmental variations separately. However, speaker and environmental variation always coexist in real-world speech. In this paper, we propose to combine eigenvoice and VTS. Specifically, we introduce eigenvoice speaker modeling for the clean speech into VTS's nonlinear mismatch function. In contrast, the standard...

chapter

Real-time semi-blind speech extraction with speaker direction tracking on Kinect

Yuji Onuma, Noriyoshi Kamado, Hiroshi Saruwatari, Kiyohiro Shikano

Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 6

2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot image information and a prestored initial separation filter bank. From this image information, an ICA initial...

chapter

Cross-Channel Spectral Subtraction for meeting speech recognition

Yu Nasu, Koichi Shinoda, Sadaoki Furui

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4812 - 4815

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose Cross-Channel Spectral Subtraction (CCSS), a source separation method for recognizing meeting speech where one microphone is prepared for each speaker. The method quickly adapts to changes in transfer functions and uses spectral subtraction to suppress the speech of other speakers. Compared with conventional source separation methods based on independent component analysis (ICA) or that...

Keywords:
ACCURACY
NOISE
SPEECH RECOGNITION

Publication date

Set your own date range

INFONA - science communication portal

Search results

Thai speech recognition using Neuro-fuzzy system

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

Robust sound event recognition using convolutional neural networks

Weighted training for speech under Lombard Effect for speaker recognition

A simple but efficient real-time Voice Activity Detection algorithm

Neural response based phoneme classification under noisy condition

Making a robot dance to diverse musical genre in noisy environments

On existence of optimal boundary value between early reflections and late reverberation

Emotion recognition from telephone speech using acoustic and nonlinear features

Overlapped sub-band modulation spectrum normalization techniques for robust speech recognition

An approach for formant based speech recognition in noise

Post-processing of the recognized speech for web presentation of large audio archive

Fourier-Bessel cepstral coefficients for robust speech recognition

Fine-tuning HMMS for nonverbal vocalizations in spontaneous speech: A multicorpus perspective

MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments

Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition

Easy does it: Robust spectro-temporal many-stream ASR without fine tuning streams

Combining eigenvoice speaker modeling and VTS-based environment compensation for robust speech recognition

Real-time semi-blind speech extraction with speaker direction tracking on Kinect

Cross-Channel Spectral Subtraction for meeting speech recognition

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options