Search results

Items from 21 to 40 out of 623 results

chapter

Employment of voicing information of speech spectra for noise-robust speaker identification

Peter Jancovic, Munevver Kokuer

2007 15th European Signal Processing Conference > 2399 - 2403

2007 15th European Signal Processing Conference

This paper presents a novel method for voicing information estimation of individual frequency-regions of speech spectra and its employment in a text-independent speaker identification system. The voicing information is incorporated to the system in a form of a mask in a marginalization-based missing-feature model. Experiments were performed on speech data from the TIMIT database corrupted by stationary...

chapter

Recognition of phonemes from estimation errors

L Baghai-Ravary, S W Beet

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

Speech recognition systems generally use delta and delta-delta (velocity and acceleration) coefficients to characterise the dynamics apparent in frame-based representations of speech. These coefficients can be thought of as the errors of simple predictors. This paper describes the use of error coefficients derived from more advanced (and accurate) forms of prediction and interpolation. Both overall...

chapter

Improved phonotactic analysis in automatic language identification

Jiri Navratil

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

This paper presents a method for phone-dependent weighting within phonotactic models in automatic language identification. Based on statistical analysis of the phonetic-recognizer behaviour, a phone confidence measure is derived and used to weight the bigram probabilities during testing. The confidence corresponds to the expected decoding stability of individual phones. The proposed method was shown...

chapter

Deep neural networks for cochannel speaker identification

Xiaojia Zhao, Yuxuan Wang, DeLiang Wang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4824 - 4828

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speaker identification (SID) in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a challenging problem. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This paper studies cochannel SID in both anechoic...

chapter

Language model adaptation for academic lectures using character recognition result of presentation slides

Yuya Akita, Yizheng Tong, Tatsuya Kawahara

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5431 - 5435

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

For automatic speech recognition (ASR) of lectures, texts of presentation slides are expected to be useful for adapting a language model, while slide texts are not always available in a machine-readable form. In this paper, we propose a language model adaptation framework that uses character recognition results of slide images in a lecture video. Since character recognition results contain many errors,...

chapter

Localized error detection for targeted clarification in a virtual assistant

Svetlana Stoyanchev, Michael Johnston

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5241 - 5245

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a novel approach for addressing automatic speech recognition (ASR) and natural language understanding (NLU) errors in an interactive spoken dialog system using targeted clarification (TC). TC applies when a spoken utterance is partially recognized by focusing a clarification question on the misrecognized part of the utterance. A key component of TC is accurate detection of localized ASR...

chapter

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

Hynek Hermansky, Lukas Burget, Jordan Cohen, Emmanuel Dupoux, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5009 - 5013

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken...

chapter

Speech recognition with prediction-adaptation-correction recurrent neural networks

Yu Zhang, Dong Yu, Michael L. Seltzer, Jasha Droppo

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5004 - 5008

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose the prediction-adaptation-correction RNN (PAC-RNN), in which a correction DNN estimates the state posterior probability based on both the current frame and the prediction made on the past frames by a prediction DNN. The result from the main DNN is fed back to the prediction DNN to make better predictions for the future frames. In the PAC-RNN, we can consider that, given the new, current...

chapter

Improved recognition of contact names in voice commands

Petar Aleksic, Cyril Allauzen, David Elson, Aleksandar Kracun, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5172 - 5175

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The recognition of contact names in mobile-device voice commands is a challenging problem. Some of the difficulties include potentially infinite vocabularies, low probability of contact tokens in the language model (LM), increased false triggering of contact voice commands when none are spoken, and very large and noisy contact name lists. In this paper we suggest solutions for each of these difficulties.

chapter

Discriminative spectral learning of hidden markov models for human activity recognition

Alfredo Nazabal, Antonio Artes-Rodriguez

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1966 - 1970

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Hidden Markov Models (HMMs) are one of the most important techniques to model and classify sequential data. Maximum Likelihood (ML) and (parametric and non-parametric) Bayesian estimation of the HMM parameters suffers from local maxima and in massive datasets they can be specially time consuming. In this paper, we extend the spectral learning of HMMs, a moment matching learning technique free from...

chapter

Robust sound event recognition using convolutional neural networks

Haomin Zhang, Ian McLoughlin, Yan Song

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 559 - 563

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Traditional sound event recognition methods based on informative front end features such as MFCC, with back end sequencing methods such as HMM, tend to perform poorly in the presence of interfering acoustic noise. Since noise corruption may be unavoidable in practical situations, it is important to develop more robust features and classifiers. Recent advances in this field use powerful machine learning...

chapter

Cepstral noise subtraction for robust automatic speech recognition

Robert Rehr, Timo Gerkmann

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 375 - 378

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...

chapter

Speech emotion recognition with acoustic and lexical features

Qin Jin, Chengxin Li, Shizhe Chen, Huimin Wu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4749 - 4753

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we explore one of the key aspects in building an emotion recognition system: generating suitable feature representations. We generate feature representations from both acoustic and lexical levels. At the acoustic level, we first extract low-level features such as intensity, F0, jitter, shimmer and spectral contours etc. We then generate different acoustic feature representations based...

chapter

A unified framework for filterbank and time-frequency basis vectors in ASR frontends

Xiaoyu Liu, Stephen A. Zahorian

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4659 - 4663

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

For many years, filterbanks have been widely used as one step of frontend feature extraction for Automatic Speech Recognition (ASR). In this paper, we propose a unified framework for ASR frontends, by first moving the nonlinear amplitude scaling, and then combining the filterbank weights with the cosine basis vectors. As part of this framework, we also show that the delta terms used to encode feature...

chapter

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Atsunori Ogawa, Takaaki Hori

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4370 - 4374

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection...

chapter

Extracting deep bottleneck features for visual speech recognition

Chao Sui, Roberto Togneri, Mohammed Bennamoun

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1518 - 1522

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Motivated by the recent progresses in the use of deep learning techniques for acoustic speech recognition, we present in this paper a visual deep bottleneck feature (DBNF) learning scheme using a stacked auto-encoder combined with other techniques. Experimental results show that our proposed deep feature learning scheme yields approximately 24% relative improvement for visual speech accuracy. To the...

chapter

Weighted training for speech under Lombard Effect for speaker recognition

Muhammad Muneeb Saleem, Gang Liu, John H.L. Hansen

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4350 - 4354

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...

chapter

Feedback-based handwriting recognition from inertial sensor data for wearable devices

Yujia Li, Kaisheng Yao, Geoffrey Zweig

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2269 - 2273

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a novel interactive method for recognizing handwritten words, using the inertial sensor data available on smart watches. The goal is to allow the user to write with a finger, and use the smart watch sensor signals to infer what the user has written. Past work has exploited the similarity of handwriting recognition to speech recognition in order to deploy HMM based methods. In contrast...

chapter

Small-footprint high-performance deep neural network-based speech recognition using split-VQ

Yongqiang Wang, Jinyu Li, Yifan Gong

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4984 - 4988

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Due to a large number of parameters in deep neural networks (DNNs), it is challenging to design a small-footprint DNN-based speech recognition system while maintaining a high recognition performance. Even with a singular value matrix decomposition (SVD) method and scalar quantization, the DNN model is still too large to be deployed on many mobile devices. Common practices like reducing the number...

chapter

Audio-visual speech recognition with a hybrid SVM-HMM system

Mihai Gurban, Jean-Philippe Thiran

2005 13th European Signal Processing Conference > 1 - 4

2005 13th European Signal Processing Conference

Traditional speech recognition systems use Gaussian mixture models to obtain the likelihoods of individual phonemes, which are then used as state emission probabilities in hidden Markov models representing the words. In hybrid systems, the Gaussian mixtures are replaced by more discriminant classifiers, leading to an improved performance. Most of the time the classifiers used in such systems are neural...

Keywords:
ACCURACY
SPEECH RECOGNITION

Publication date

Set your own date range

Content availability

Available (615)
None (8)

Keywords

SPEECH (465)
HIDDEN MARKOV MODELS (250)
FEATURE EXTRACTION (187)
TRAINING (187)
ACOUSTICS (113)
DATABASES (90)
MEL FREQUENCY CEPSTRAL COEFFICIENT (88)
SPEECH PROCESSING (79)
AUTOMATIC SPEECH RECOGNITION (62)
EMOTION RECOGNITION (62)
NOISE (60)
SUPPORT VECTOR MACHINES (57)
NATURAL LANGUAGE PROCESSING (53)
ARTIFICIAL NEURAL NETWORKS (47)
DATA MINING (44)
SPEAKER RECOGNITION (40)
COMPUTATIONAL MODELING (34)
DECODING (33)
CLASSIFICATION ALGORITHMS (32)
ROBUSTNESS (32)
NOISE MEASUREMENT (31)
VOCABULARY (31)
TRAINING DATA (30)
HIDDEN MARKOV MODEL (29)
DATA MODELS (28)
CONTEXT (26)
SIGNAL TO NOISE RATIO (26)
TESTING (26)
VECTORS (26)
NEURAL NETWORKS (24)
MFCC (23)
ADAPTATION MODEL (22)
LEARNING (ARTIFICIAL INTELLIGENCE) (22)
CEPSTRAL ANALYSIS (21)
MATHEMATICAL MODEL (21)
SPEECH CODING (21)
CORRELATION (20)
HUMANS (20)
VISUALIZATION (20)
DICTIONARIES (19)
HMM (19)
LATTICES (19)
NATURAL LANGUAGES (19)
NEURAL NETS (19)
PATTERN CLASSIFICATION (19)
PROBABILITY (19)
ESTIMATION (18)
SPEAKER IDENTIFICATION (18)
SPEECH ENHANCEMENT (18)
ROBOTS (17)
COMPUTERS (16)
ERROR ANALYSIS (16)
MICROPHONES (16)
SIGNAL PROCESSING (16)
SUPPORT VECTOR MACHINE (16)
ACOUSTIC SIGNAL PROCESSING (15)
ENTROPY (15)
INFORMATION RETRIEVAL (15)
PATTERN RECOGNITION (15)
ROBUST SPEECH RECOGNITION (15)
STATISTICAL ANALYSIS (15)
SUPPORT VECTOR MACHINE CLASSIFICATION (15)
ALGORITHM DESIGN AND ANALYSIS (14)
EQUATIONS (14)
FACE RECOGNITION (14)
GAUSSIAN PROCESSES (14)
LANGUAGE MODEL (14)
OPTIMIZATION (14)
ACOUSTIC MODELING (13)
CONFERENCES (13)
CONTEXT MODELING (13)
INDEXES (13)
KERNEL (13)
LABELING (13)
SPEECH ANALYSIS (13)
DISCRETE WAVELET TRANSFORMS (12)
IMAGE SEGMENTATION (12)
INDEXING (12)
MACHINE LEARNING (12)
PHONEME RECOGNITION (12)
PRINCIPAL COMPONENT ANALYSIS (12)
SVM (12)
TRANSFORMS (12)
ADAPTATION MODELS (11)
CHARACTER RECOGNITION (11)
EDUCATIONAL INSTITUTIONS (11)
HUMAN COMPUTER INTERACTION (11)
MULTILAYER PERCEPTRONS (11)
SIGNAL CLASSIFICATION (11)
SPECTRAL ANALYSIS (11)
DISCRIMINATIVE TRAINING (10)
MAXIMUM LIKELIHOOD ESTIMATION (10)
SPECTROGRAM (10)
SPEECH SYNTHESIS (10)
ACOUSTIC MODEL (9)
ASR (9)
CLASSIFICATION (9)
DECISION TREES (9)
more

INFONA - science communication portal

Search results

Employment of voicing information of speech spectra for noise-robust speaker identification

Recognition of phonemes from estimation errors

Improved phonotactic analysis in automatic language identification

Deep neural networks for cochannel speaker identification

Language model adaptation for academic lectures using character recognition result of presentation slides

Localized error detection for targeted clarification in a virtual assistant

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

Speech recognition with prediction-adaptation-correction recurrent neural networks

Improved recognition of contact names in voice commands

Discriminative spectral learning of hidden markov models for human activity recognition

Robust sound event recognition using convolutional neural networks

Cepstral noise subtraction for robust automatic speech recognition

Speech emotion recognition with acoustic and lexical features

A unified framework for filterbank and time-frequency basis vectors in ASR frontends

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Extracting deep bottleneck features for visual speech recognition

Weighted training for speech under Lombard Effect for speaker recognition

Feedback-based handwriting recognition from inertial sensor data for wearable devices

Small-footprint high-performance deep neural network-based speech recognition using split-VQ

Audio-visual speech recognition with a hybrid SVM-HMM system

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options