Advanced search

Advanced search in people

From:

To:

Items from 1 to 12 out of 12 results

chapter

Cepstral noise subtraction for robust automatic speech recognition

Robert Rehr, Timo Gerkmann

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 375 - 378

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...

chapter

Automatic evaluation of hypernasality and speech intelligibility for children with cleft palate

Ling He, Jing Zhang, Qi Liu, Heng Yin, more

2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA) > 220 - 223

2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA)

The speech of cleft palate (CP) patients has typical characteristics. Hypernasality and low speech intelligibility are the primary characteristics for CP speech. In this work, an automatic evaluation of different levels of hypernasality and speech intelligibility algorithm for CP speech was proposed, in order to provide an objective tool for speech therapist. To identify different levels of hypernasality,...

chapter

Easy does it: Robust spectro-temporal many-stream ASR without fine tuning streams

Suman V. Ravuri, Nelson Morgan

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4309 - 4312

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Previous work has shown that spectro-temporal features reduce the word error rate for automatic speech recognition under noisy conditions. These systems, however, required significant hand-tuning in order to determine which spectral and temporal modulations should be included in a particular stream. In this work, streams are split into one spectral and temporal modulation each and their posterior...

chapter

Bangla ASR design by suppressing gender factor with gender-independent and gender-based HMM classifiers

Foyzul Hassan, Mohammed Rokibul Alam Kotwal, Mohammad Nurul Huda

2011 World Congress on Information and Communication Technologies > 1276 - 1281

2011 World Congress on Information and Communication Technologies (WICT)

Hidden factor such as gender characteristic plays an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). If there is a suppression process that represses the decrease of differences in acoustic-likelihood among categories resulted from gender factors, a robust ASR system can be realized. In our previous paper, we proposed a technique of gender effects...

chapter

Gender Effects Suppression in Bangla ASR by Designing Multiple HMM-Based Classifiers

Mohammed Rokibul Alam Kotwal, Foyzul Hassan, Md. Shafiul Alam, Shakib Ibn Daud, more

2011 International Conference on Computational Intelligence and Communication Networks > 390 - 394

2011 International Conference on Computational Intelligence and Communication Networks (CICN)

Speaker-specific characteristics play an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). It is difficult to recognize speech affected by gender factors, especially when an ASR system contains only a single acoustic model. If there exists any suppression process that represses the decrease of differences in acoustic-likelihood among categories...

chapter

Hybrid Features for Neural Network-Based Bangla ASR Incorporrating Velocity Coefficients (?)

Mohammed Rokibul Alam Kotwal, Foyzul Hassan, Shakib Ibn Daud, Md. Shafiul Alam, more

2011 International Conference on Computational Intelligence and Communication Networks > 416 - 420

2011 International Conference on Computational Intelligence and Communication Networks (CICN)

This paper presents a Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, where the second stage computes velocity (?) coefficients from the phoneme probabilities by using...

chapter

Comparing multilayer perceptron to Deep Belief Network Tandem features for robust ASR

Oriol Vinyals, Suman V. Ravuri

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4596 - 4599

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we extend the work done on integrating multilayer perceptron (MLP) networks with HMM systems via the Tandem approach. In particular, we explore whether the use of Deep Belief Networks (DBN) adds any substantial gain over MLPs on the Aurora2 speech recognition task under mismatched noise conditions. Our findings suggest that DBNs outperform single layer MLPs under the clean condition,...

chapter

Acoustic features for detection of aspirated stops

V Patil, P Rao

2011 National Conference on Communications (NCC) > 1 - 5

2011 National Conference on Communications (NCC)

Aspiration is an important phonemic feature in several Indian languages. Unlike English, languages such as Marathi have lexicons in which words with different meanings differ only in the aspiration feature of the initial voiced or unvoiced stop. Thus the reliable discrimination of aspirated stops from their unaspirated counterparts is important in automatic speech recognition for such languages. The...

chapter

A novel segmentation method of Sound-Packets for Bangla speech signal

M A N R Rahaman, A Das, M Z Nayen, M S Rahman

International Conference on Electrical&Computer Engineering (ICECE 2010) > 510 - 513

2010 6th International Conference on Electrical & Computer Engineering (ICECE 2010)

This paper describes several Sound-Packet segmentation techniques, which will facilitate Automatic Speech Recognition (ASR) for Bangla speech signal. The approximate duration of a sound-packet has been determined and an envelope-detection method has been presented to determine the end-points of sound-packets. The 1^st difference method, based on moving average of 1^st difference of the signal, is then...

chapter

Bangla speech recognition using two stage multilayer neural networks

Qamrun Nahar Eity, M Banik, N J Lisa, F Hassan, more

2010 International Conference on Signal and Image Processing > 222 - 226

2010 International Conference on Signal and Image Processing (ICSIP 2010)

This paper describes a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ are inserted into another MLN to...

chapter

Auditory Features Revisited for Robust Speech Recognition

F Kelly, N Harte

2010 20th International Conference on Pattern Recognition > 4456 - 4459

2010 20th International Conference on Pattern Recognition (ICPR 2010)

Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC)...

chapter

Minimum variance modulation filter for robust speech recognition

Y.-H.B. Chiu, R.M. Stern

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 3917 - 3920

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper describes a way of designing modulation filter by data driven analysis which improves the performance of automatic speech recognition systems that operate in real environments. The filter for each nonlinear channel output is obtained by a constrained optimization process which jointly minimizes the environmental distortion as well as the distortion caused by the filter itself. Recognition...

Filter options

Keywords:
ACCURACY
SPEECH RECOGNITION
AUTOMATIC SPEECH RECOGNITION
MEL FREQUENCY CEPSTRAL COEFFICIENT

Publication date

Set your own date range

Keywords

SPEECH (9)
FEATURE EXTRACTION (7)
HIDDEN MARKOV MODELS (6)
CEPSTRAL ANALYSIS (3)
HIDDEN MARKOV MODEL (3)
NOISE (3)
NOISE MEASUREMENT (3)
ACOUSTIC FEATURES (2)
ACOUSTIC MODEL (2)
MEL-FREQUENCY CEPSTRAL COEFFICIENTS (2)
MFCC (2)
MODULATION (2)
MULTILAYER NEURAL NETWORK (2)
NOISE ROBUSTNESS (2)
SIGNAL TO NOISE RATIO (2)
TRAINING (2)
ACOUSTIC MEASUREMENTS (1)
ARTIFICIAL NEURAL NETWORKS (1)
ASPIRATED STOP DETECTION (1)
ASPIRATION (1)
ASPIRATION FEATURE DETECTION (1)
AUDIO SIGNAL PROCESSING (1)
AUDITORY BASED FRONT-ENDS (1)
AUDITORY FEATURES (1)
BANGLA PHONEME RECOGNITION (1)
BANGLA SPEECH CORPUS (1)
BANGLA SPEECH SEGMENTATION (1)
BANGLA SPEECH SIGNAL (1)
CEPSTRUM (1)
CLEFT PALATE (1)
CONSTRAINED OPTIMIZATION PROCESS (1)
DATA ANALYSIS (1)
DATA DRIVEN ANALYSIS (1)
DEEP BELIEF NETWORK (1)
DISTORTION (1)
DURATIONAL FEATURE (1)
ENVELOPE-DETECTION METHOD (1)
ENVIRONMENTAL DISTORTION (1)
ESTIMATION (1)
FEATURE NORMALIZATION (1)
FILTER BANK CONFIGURATION (1)
FILTER BANKS (1)
FILTER DESIGN (1)
FILTERING THEORY (1)
FINITE IMPULSE RESPONSE FILTER (1)
FRAME-BASED SPECTRAL REPRESENTATION (1)
GENDER EFFECTS SUPPRESSION (1)
GENDER EFFECTS SUPRESSION (1)
HIDDEN MARKOV MODEL-BASED RECOGNISER (1)
HYPERNASALITY (1)
INDEXES (1)
INDIAN LANGUAGE (1)
LANDMARK-BASED ACOUSTIC FEATURE CLASSIFIER (1)
MARATHI (1)
MEL FREQUENCY CEPSTRAL COEFFICIENTS (1)
MINIMUM VARIANCE MODULATION FILTER DESIGN (1)
MODULATION FILTER (1)
MODULATION FREQUENCY ANALYSIS (1)
MOVING AVERAGE (1)
MULTILAYER PERCEPTRON (1)
MULTILAYER PERCEPTRONS (1)
NATURAL LANGUAGE PROCESSING (1)
NATURAL LANGUAGES (1)
NONHOMOGENEOUS MEDIA (1)
NONLINEAR CHANNEL OUTPUT FILTER (1)
NONLINEAR FILTERS (1)
OPTIMISATION (1)
PHONEME PROBABILITIES (1)
PHONEMIC FEATURE (1)
POWER-BIAS SUBTRACTION (1)
POWER-LAW NONLINEARITY (1)
PROBABILITY (1)
ROBUST AUTOMATIC SPEECH RECOGNITION SYSTEM (1)
ROBUSTNESS (1)
SOUND-PACKET (1)
SOUND-PACKET SEGMENTATION TECHNIQUES (1)
SOUND-UNIT (1)
SPECTROGRAM (1)
SPECTROTEMPORAL FEATURES (1)
SPEECH ENHANCEMENT (1)
SPEECH INTELLIGIBILITY (1)
SPEECH PROCESSING (1)
TIMIT DATABASE (1)
TWO STAGE MULTILAYER NEURAL NETWORKS (1)
VELOCITY COEFFICIENT (1)
VOICE QUALITY MEASURE (1)
ZERO-CROSSINGS WITH PEAK AMPLITUDES (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options