Search results

Items from 1 to 20 out of 27 results

chapter

Robust speech recognition for similar Japanese pronunciation phrases under noisy conditions

George Mufungulwa, Hiroshi Tsutsui, Yoshikazu Miyanaga, Shin-ichi Abe, more

2017 International Symposium on Signals, Circuits and Systems (ISSCS) > 1 - 4

2017 International Symposium on Signals, Circuits and Systems (ISSCS)

This paper proposes a new noisy robust speech recognition method. Under noise circumstances, several noise reduction methods have been developed and they are applied in various noise conditions. However, in case of similar pronunciation speech, for example, it is still not easy to realize high recognition accuracy. In this paper, the new processing algorithm into speech modulation spectrum is proposed...

chapter

An efficient distributed speech processing in noisy mobile communications

M.R.L. Daalache, D. Addou, M. Boudraa

2017 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS) > 1 - 4

2017 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS)

Mobile communications are greatly influenced by environmental noise that may cause a significant deterioration in automatic speech recognition (ASR) systems performance. In this paper, we present a new framework integrating a noise-robust front-end in distributed speech recognition (DSR) systems. Using the Aurora-2 speech database, the authors evaluate the development of an additional feature set...

chapter

Integrating lip-reading and thai speech to control electronic devices in a vehicle

Isamail Masamae, Panyayot Chaikan

2015 5th IEEE International Conference on System Engineering and Technology (ICSET) > 29 - 32

2015 5th IEEE International Conference on System Engineering and Technology (ICSET)

This paper presents the use of lip-reading and Thai speech to control electronic devices in a vehicle. The Viola-Jones algorithm detects the face of the driver and the constrained local model detects their mouth area before three lips features are extracted. Hidden Markov models are utilized to recognize speech and lip movement, with the lip movement recognizer offering better accuracy than the speech...

chapter

Deep neural networks for estimating speech model activations

Donald S. Williamson, Yuxuan Wang, DeLiang Wang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5113 - 5117

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. Our approach uses two stages of deep neural networks, where the first stage estimates the ideal ratio mask that separates speech from noise, and the second stage maps the ratio-masked speech to the clean speech activation matrices that are used for nonnegative...

chapter

Robust spectro-temporal speech features with model-based distribution equalization

Samuel K. Ngouoko M, Martin Heckmann, Britta Wrede

2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) > 1 - 4

2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)

Previously, we applied a distribution equalization on our HIerarchical Spectro-Temporal (HIST) features using distributions estimated from histogram of one or several utterances. Although a performance increase could be observed in both cases, we noticed low performance improvement when estimating the distribution only from one utterance. The aim here is to determine a parametric distribution from...

chapter

SVM-based separation of unvoiced-voiced speech in cochannel conditions

Ke Hu, DeLiang Wang

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4545 - 4548

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Unvoiced-voiced portions of cochannel speech contain considerable amounts of both voiced and unvoiced speech and play a significant role in separation. Motivated by recent developments in separation of speech from nonspeech noise, we propose a classification-based approach for unvoiced-voiced speech separation. A new feature set consisting of pitch-based features and gammatone frequency cepstral coefficients...

chapter

Noise classification using Gaussian Mixture Models

Hitesh Anand Gupta, Vinay M Varma

2012 1st International Conference on Recent Advances in Information Technology (RAIT) > 821 - 825

2012 1st International Conference on Recent Advances in Information Technology (RAIT)

Gaussian Mixture Models (GMMs) have been proven effective in modeling speech and other acoustic signals. In this study, we have used GMMs to model different noise sources, viz. subway, babble, car and exhibition. Expectation maximization algorithm has been implemented to fit the model. Further, we present the ‘threshold’ method which uses the energy coefficient of the Mel - Frequency Cepstral Coefficients...

chapter

Cyclic feature-based signal detection and classification

Desimir Z. Vucic, Ivan P. Pokrajac, Predrag Okiljevic

2011 19thTelecommunications Forum (TELFOR) Proceedings of Papers > 794 - 796

2011 19th Telecommunications Forum Telfor (TELFOR)

Some principles of the cyclic feature-based signal detection and classification are describes. α-profiles for spectral coherence and spectral correlation density (SCD) are considered for this purpose. The theoretical SCD α-profile of OFDM/QAM signal with cyclostationary signature is shown.

chapter

Research on a kind of Noisy Tibetan speech recognition algorithm based on WNN

Yong Lu, Haining Huang

2011 Seventh International Conference on Natural Computation > 2 > 605 - 608

2011 Seventh International Conference on Natural Computation (ICNC)

The research on noisy Tibetan speech recognition algorithm based on wavelet neural network (WNN) combined with auditory feature was carried out in this paper. The recognition classifier based on WNN was designed, and Mel Frequency Cepstrum Constant (MFCC) feature was given. Then the simulation on the given algorithm was run under the different signal to noise ratios (SNR), and the results illustrated...

chapter

Shout detection in noise

Jouni Pohjalainen, Paavo Alku, Tomi Kinnunen

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4968 - 4971

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

For the task of detecting shouted speech in a noisy environment, this paper introduces a system based on mel frequency cepstral coefficient (MFCC) feature extraction, unsupervised frame dropping and Gaussian mixture model (GMM) classification. The evaluation material consists of phonemically identical speech and shouting as well as environmental noise of varying levels. The performance of the shout...

chapter

Using multiple visual tandem streams in audio-visual speech recognition

Ibrahim Saygin Topkaya, Hakan Erdogan

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4988 - 4991

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The method which is called the “tandem approach” in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach...

chapter

Speech Enhancement Using MMSE Estimation and Spectral Subtraction Methods

V K Gupta, A Bhowmick, M Chandra, S N Sharan

2011 International Conference on Devices and Communications (ICDeCom) > 1 - 5

2011 International Conference on Devices and Communications (ICDeCom)

Efficiency of the speech recognition system in noise free environment is impressive but in the presence of environmental noise the efficiency of the speech recognition system deteriorates drastically. Environmental noise also affects human-to-human or human-to-machine communications and degrades the speech quality as well as intelligibility. Here a speech recognition system is proposed in presence...

chapter

Selecting Features Using the SFS in Conjunction with Vector Quantization

J Schenk, G Rigoll

2010 12th International Conference on Frontiers in Handwriting Recognition > 471 - 476

2010 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010)

When discrete Hidden-Markov-Models (HMMs)-based recognition is performed, vector quantization (VQ) is used to transform continuous observations to sequences of discrete symbols. After VQ, the quantization error is not spread equally among the features. This impairs the feature significance, which is important when features are selected, e. g. by applying the Sequential Forward Selection (SFS). In...

chapter

Acoustic model adaptation via Linear Spline Interpolation for robust speech recognition

Michael L Seltzer, Alex Acero, Kaustubh Kalgaonkar

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4550 - 4553

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

We recently proposed a new algorithm to perform acoustic model adaptation to noisy environments called Linear Spline Interpolation (LSI). In this method, the nonlinear relationship between clean and noisy speech features is modeled using linear spline regression. Linear spline parameters that minimize the error the between the predicted noisy features and the actual noisy features are learned from...

chapter

2D psychoacoustic filtering for robust speech recognition

Peng Dai, Ing Yann Soon, Chai Kiat Yeo

2009 7th International Conference on Information, Communications and Signal Processing (ICICS) > 1 - 5

2009 7th International Conference on Information, Communications & Signal Processing (ICICS)

One of the weaknesses of speech recognition system is its lack of robustness to background noise as compared to human listeners under similarly conditions. This paper proposes a 2D psychoacoustic modeling algorithm which is integrated with a feature extraction front-end for hidden Markov model (HMM). The proposed algorithm incorporates the properties of human auditory system and applies it to the...

chapter

Combined speech decoders output for phoneme recognition enhancement

K. Abida, F. Karray, W. Abida

2009 3rd International Conference on Signals, Circuits and Systems (SCS) > 1 - 6

2009 3rd International Conference on Signals, Circuits and Systems (SCS 2009)

Phoneme recognition is an essential component of any robust speech decoder and has been tackled by many researchers. Speech feature extraction constitutes the front end module of any speech decoder: it plays an essential role and has a strong impact on the recognition performance. The research community is aggressively searching for more powerful solutions which combine the existing feature extraction...

chapter

Novel VQ with constraints on the quantization error distribution

J. Schenk, F. Wallhoff, G. Rigoll

2009 IEEE International Conference on Multimedia and Expo > 169 - 172

2009 IEEE International Conference on Multimedia and Expo (ICME)

In this paper, we motivate and introduce a novel vector quantization (VQ) scheme for distributing the quantization error among the quantized features of a continuous feature vector in a predefined manner. This is done by defining ratios between the individual quantization errors of the features and shaping the Voronoi cells accordingly. In a series of experiments we show that the novel approach is...

chapter

Note onset detection for the transcription of polyphonic piano music

C.G.v.d. Boogaart, R. Lienhart

2009 IEEE International Conference on Multimedia and Expo > 446 - 449

2009 IEEE International Conference on Multimedia and Expo (ICME)

Transcription of music is the process of generating a symbolic representation such as a score sheet or a MIDI file from an audio recording of a piece of music. A statistical machine learning approach for detecting note onsets in polyphonic piano music is presented. An area from the spectrogram of the sound is concatenated into one feature vector. A cascade of boosted classifiers is used for dimensionality...

chapter

Robust word boundary detection in spontaneous speech using acoustic and lexical cues

A. Tsiartas, P.K. Ghosh, P. Georgiou, S. Narayanan

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4785 - 4788

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

We consider the problem of word boundary detection in spontaneous speech utterances. Acoustic features have been well explored in the literature in the context of word boundary detection; however, in spontaneous speech of Switchboard-I corpus, we found that the accuracy of word boundary detection using acoustic features is poor (F-score ~ 0.63). We propose a new feature - that captures lexical cues...

chapter

HMM compensation based on non-uniform spectral compression for noisy speech recognition

Geng-xin Ning, Jun Zhang, Hua Yu

2008 11th IEEE Singapore International Conference on Communication Systems > 184 - 187

2008 11th IEEE Singapore International Conference on Communication Systems (ICCS)

A robust speech feature extraction method based on the power law of hearing and non-uniform spectral compression technique is proposed, and the correspondent model compensation algorithm is given. The mismatch functions, reflecting the infections of additive noise and spectral compression, and the model compensation formulae are deduced. The experiment results show that the significant improvement...

Data set:
ieee
Keywords:
SIGNAL TO NOISE RATIO
FEATURE EXTRACTION
HIDDEN MARKOV MODELS

Publication date

Set your own date range

INFONA - science communication portal

Search results

Robust speech recognition for similar Japanese pronunciation phrases under noisy conditions

An efficient distributed speech processing in noisy mobile communications

Integrating lip-reading and thai speech to control electronic devices in a vehicle

Deep neural networks for estimating speech model activations

Robust spectro-temporal speech features with model-based distribution equalization

SVM-based separation of unvoiced-voiced speech in cochannel conditions

Noise classification using Gaussian Mixture Models

Cyclic feature-based signal detection and classification

Research on a kind of Noisy Tibetan speech recognition algorithm based on WNN

Shout detection in noise

Using multiple visual tandem streams in audio-visual speech recognition

Speech Enhancement Using MMSE Estimation and Spectral Subtraction Methods

Selecting Features Using the SFS in Conjunction with Vector Quantization

Acoustic model adaptation via Linear Spline Interpolation for robust speech recognition

2D psychoacoustic filtering for robust speech recognition

Combined speech decoders output for phoneme recognition enhancement

Novel VQ with constraints on the quantization error distribution

Note onset detection for the transcription of polyphonic piano music

Robust word boundary detection in spontaneous speech using acoustic and lexical cues

HMM compensation based on non-uniform spectral compression for noisy speech recognition

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options