Search results

Items from 41 to 60 out of 970 results

chapter

Improved phonotactic analysis in automatic language identification

Jiri Navratil

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

This paper presents a method for phone-dependent weighting within phonotactic models in automatic language identification. Based on statistical analysis of the phonetic-recognizer behaviour, a phone confidence measure is derived and used to weight the bigram probabilities during testing. The confidence corresponds to the expected decoding stability of individual phones. The proposed method was shown...

chapter

Text-constrained speaker verification using fuzzy C means vector quantization

Debnath Saswati, Soni Badal, Das Pradip K.

2015 International Conference on Communications and Signal Processing (ICCSP) > 1511 - 1515

2015 International Conference on Communications and Signal Processing (ICCSP)

The most successful approach to speech and speaker recognition is to treat the speech signal as a stochastic pattern and to use a statistical pattern recognition technique for matching utterances. This paper attempts to study the performance of Text dependent speaker verification system using Delta-Delta Mel Frequency Cepstral Coefficients (MFCC-Δ-Δ) feature vector and Fuzzy C means (FCM) speaker...

chapter

Latent time-frequency component analysis: A novel pitch-based approach for singing voice separation

Xiu Zhang, Wei Li, Bilei Zhu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 131 - 135

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Monaural singing voice separation has aroused considerable attention. Many pitch-based methods have been proposed to address this task, but generally have limited performance. The most crucial difficulties lie in the inaccurate judgment on voiced pitches and the failed recognition on unvoiced singing sounds. In this paper, we propose a novel algorithm based on the latent component analysis of time-frequency...

chapter

Detection of depression in adolescents based on statistical modeling of emotional influences in parent-adolescent conversations

Melissa N Stolar, Margaret Lech, Nicholas B Allen

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 987 - 991

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The current benchmark speech-based depression detection techniques rely on acoustic speech parameters collected from large sets of representative speech recordings. This study for the first time investigates depression detection based on the higher order influence model (HOIM) coefficients and emotional transition parameters derived from a relatively small set of conversational speech recordings representing...

chapter

Deep neural networks for cochannel speaker identification

Xiaojia Zhao, Yuxuan Wang, DeLiang Wang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4824 - 4828

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speaker identification (SID) in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a challenging problem. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This paper studies cochannel SID in both anechoic...

chapter

Language model adaptation for academic lectures using character recognition result of presentation slides

Yuya Akita, Yizheng Tong, Tatsuya Kawahara

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5431 - 5435

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

For automatic speech recognition (ASR) of lectures, texts of presentation slides are expected to be useful for adapting a language model, while slide texts are not always available in a machine-readable form. In this paper, we propose a language model adaptation framework that uses character recognition results of slide images in a lecture video. Since character recognition results contain many errors,...

chapter

Localized error detection for targeted clarification in a virtual assistant

Svetlana Stoyanchev, Michael Johnston

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5241 - 5245

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a novel approach for addressing automatic speech recognition (ASR) and natural language understanding (NLU) errors in an interactive spoken dialog system using targeted clarification (TC). TC applies when a spoken utterance is partially recognized by focusing a clarification question on the misrecognized part of the utterance. A key component of TC is accurate detection of localized ASR...

chapter

Real-time multiple DOA estimation of speech sources in wireless acoustic sensor networks

David Ayllon, Roberto Gil-Pita, Manuel Rosa-Zurera, Hamid Krim

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2709 - 2713

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Indoor localization of multiple speech sources in wireless acoustic sensor networks (WASNs) is an open and interesting problem with many practical applications, but the presence of noise and reverberations complicates the problem. In this paper, a distributed algorithm for multiple DOA estimation of speech sources in WASNs is presented. The method exploits the sparsity of speech sources in the time-frequency...

chapter

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

Hynek Hermansky, Lukas Burget, Jordan Cohen, Emmanuel Dupoux, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5009 - 5013

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken...

chapter

Speech recognition with prediction-adaptation-correction recurrent neural networks

Yu Zhang, Dong Yu, Michael L. Seltzer, Jasha Droppo

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5004 - 5008

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose the prediction-adaptation-correction RNN (PAC-RNN), in which a correction DNN estimates the state posterior probability based on both the current frame and the prediction made on the past frames by a prediction DNN. The result from the main DNN is fed back to the prediction DNN to make better predictions for the future frames. In the PAC-RNN, we can consider that, given the new, current...

chapter

Multi-shift principal component analysis based primary component extraction for spatial audio reproduction

Jianjun He, Woon-Seng Gan

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 350 - 354

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In spatial audio analysis-synthesis, one of the key issues is to decompose a signal into primary and ambient components based on their spatial features. Principal component analysis (PCA) has been widely employed in primary component extraction, and shifted PCA (SPCA) is employed to enhance the primary extraction for input signals involving the inter-channel time difference. However, SPCA generally...

chapter

Low rank tensor deconvolution

Anh-Huy Phan, Petr Tichavsky, Andrzej Cichocki

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2169 - 2173

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a low-rank tensor deconvolution problem which seeks multiway replicative patterns and corresponding activating tensors of rank-1. An alternating least squares (ALS) algorithm has been derived for the model to sequentially update loading components and the patterns. In addition, together with a good initialisation method using tensor diagonalization, the update rules have...

chapter

Robust sound event recognition using convolutional neural networks

Haomin Zhang, Ian McLoughlin, Yan Song

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 559 - 563

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Traditional sound event recognition methods based on informative front end features such as MFCC, with back end sequencing methods such as HMM, tend to perform poorly in the presence of interfering acoustic noise. Since noise corruption may be unavoidable in practical situations, it is important to develop more robust features and classifiers. Recent advances in this field use powerful machine learning...

chapter

Finding line spectral frequencies using the fast fourier transform

Tom Backstrom, Christian Fischer Pedersen, Johannes Fischer, Grzegorz Pietrzyk

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5122 - 5126

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Main-stream speech codecs are based on modelling the speech source by a linear predictor. An efficient domain for quantization and coding of this linear predictor is the line spectral frequency representation, where the predictor is encoded into an ordered set of frequencies that correspond to the roots of the corresponding line spectral polynomials. While this representation is robust in terms of...

chapter

Voice quality: Not only about “you” but also about “your interlocutor”

Ya Li, Nick Campbell, Jianhua Tao

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4739 - 4743

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the effect of voice quality in commutative speech. Voice quality is often considered as the characteristic auditory colouring of an individual speaker's voice, but in our study, we find that voice quality can also reveal information about the interlocutor in everyday social interactions. In the correlation analysis between acoustic measures and interlocutors, the effect caused...

chapter

A comparative study of spectral clustering for i-vector-based speaker clustering under noisy conditions

Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2041 - 2045

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture...

chapter

Cepstral noise subtraction for robust automatic speech recognition

Robert Rehr, Timo Gerkmann

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 375 - 378

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...

chapter

Speech emotion recognition with acoustic and lexical features

Qin Jin, Chengxin Li, Shizhe Chen, Huimin Wu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4749 - 4753

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we explore one of the key aspects in building an emotion recognition system: generating suitable feature representations. We generate feature representations from both acoustic and lexical levels. At the acoustic level, we first extract low-level features such as intensity, F0, jitter, shimmer and spectral contours etc. We then generate different acoustic feature representations based...

chapter

A unified framework for filterbank and time-frequency basis vectors in ASR frontends

Xiaoyu Liu, Stephen A. Zahorian

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4659 - 4663

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

For many years, filterbanks have been widely used as one step of frontend feature extraction for Automatic Speech Recognition (ASR). In this paper, we propose a unified framework for ASR frontends, by first moving the nonlinear amplitude scaling, and then combining the filterbank weights with the cosine basis vectors. As part of this framework, we also show that the delta terms used to encode feature...

chapter

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Atsunori Ogawa, Takaaki Hori

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4370 - 4374

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection...

Keywords:
ACCURACY
SPEECH

Publication date

Set your own date range

Content availability

Available (958)
None (12)

Keywords

SPEECH RECOGNITION (465)
FEATURE EXTRACTION (332)
HIDDEN MARKOV MODELS (261)
TRAINING (241)
SPEECH PROCESSING (186)
ACOUSTICS (159)
DATABASES (139)
MEL FREQUENCY CEPSTRAL COEFFICIENT (132)
SUPPORT VECTOR MACHINES (119)
NOISE (98)
SPEAKER RECOGNITION (89)
DATA MINING (85)
NATURAL LANGUAGE PROCESSING (83)
EMOTION RECOGNITION (76)
ARTIFICIAL NEURAL NETWORKS (62)
ESTIMATION (60)
CLASSIFICATION ALGORITHMS (57)
SIGNAL TO NOISE RATIO (53)
AUTOMATIC SPEECH RECOGNITION (51)
COMPUTATIONAL MODELING (48)
VECTORS (48)
CORRELATION (45)
NOISE MEASUREMENT (44)
HUMANS (40)
CEPSTRAL ANALYSIS (39)
EDUCATIONAL INSTITUTIONS (39)
ALGORITHM DESIGN AND ANALYSIS (38)
MATHEMATICAL MODEL (38)
PATTERN CLASSIFICATION (38)
SIGNAL PROCESSING (38)
SPEAKER IDENTIFICATION (37)
LEARNING (ARTIFICIAL INTELLIGENCE) (36)
ROBUSTNESS (36)
TAGGING (36)
TESTING (36)
DECODING (35)
SPEECH CODING (35)
TRAINING DATA (35)
COMPUTERS (34)
GAUSSIAN PROCESSES (34)
MFCC (34)
ADAPTATION MODEL (33)
DATA MODELS (33)
CONFERENCES (32)
CONTEXT (32)
SPEECH SYNTHESIS (32)
HIDDEN MARKOV MODEL (31)
SPEECH ENHANCEMENT (31)
KERNEL (29)
MICROPHONES (29)
VISUALIZATION (29)
DICTIONARIES (27)
INDEXES (27)
SUPPORT VECTOR MACHINE (27)
TRANSFORMS (27)
EQUATIONS (25)
SIGNAL PROCESSING ALGORITHMS (25)
TEXT ANALYSIS (25)
AUDIO SIGNAL PROCESSING (24)
GMM (24)
SVM (24)
VOCABULARY (24)
CLASSIFICATION (23)
ENTROPY (23)
GAUSSIAN MIXTURE MODEL (23)
MACHINE LEARNING (23)
PRINCIPAL COMPONENT ANALYSIS (23)
STATISTICAL ANALYSIS (23)
ACOUSTIC SIGNAL PROCESSING (22)
OPTIMIZATION (22)
PROBABILITY (22)
SPEECH ANALYSIS (21)
ANALYTICAL MODELS (20)
COMPLEXITY THEORY (20)
SIGNAL CLASSIFICATION (20)
TIME FREQUENCY ANALYSIS (20)
DECISION TREES (19)
DELAY (19)
HMM (19)
MAXIMUM LIKELIHOOD ESTIMATION (19)
PATTERN RECOGNITION (19)
SEMANTICS (19)
SUPPORT VECTOR MACHINE CLASSIFICATION (19)
ELECTRONIC MAIL (18)
ERROR ANALYSIS (18)
LABELING (18)
REAL TIME SYSTEMS (18)
ROBOTS (18)
ADAPTATION MODELS (17)
FACE (17)
FILTERING (17)
HARMONIC ANALYSIS (17)
MUSIC (17)
NEURAL NETWORKS (17)
STRESS (17)
DETECTORS (16)
INFORMATION RETRIEVAL (16)
NATURAL LANGUAGES (16)
more

INFONA - science communication portal

Search results

Improved phonotactic analysis in automatic language identification

Text-constrained speaker verification using fuzzy C means vector quantization

Latent time-frequency component analysis: A novel pitch-based approach for singing voice separation

Detection of depression in adolescents based on statistical modeling of emotional influences in parent-adolescent conversations

Deep neural networks for cochannel speaker identification

Language model adaptation for academic lectures using character recognition result of presentation slides

Localized error detection for targeted clarification in a virtual assistant

Real-time multiple DOA estimation of speech sources in wireless acoustic sensor networks

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

Speech recognition with prediction-adaptation-correction recurrent neural networks

Multi-shift principal component analysis based primary component extraction for spatial audio reproduction

Low rank tensor deconvolution

Robust sound event recognition using convolutional neural networks

Finding line spectral frequencies using the fast fourier transform

Voice quality: Not only about “you” but also about “your interlocutor”

A comparative study of spectral clustering for i-vector-based speaker clustering under noisy conditions

Cepstral noise subtraction for robust automatic speech recognition

Speech emotion recognition with acoustic and lexical features

A unified framework for filterbank and time-frequency basis vectors in ASR frontends

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options