Search results

Items from 1 to 20 out of 32 results

chapter

Radio-browsing for developmental monitoring in Uganda

Raghav Menon, Armin Saeb, Hugh Cameron, William Kibira, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5795 - 5799

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We consider the extraction of information from broadcast radio speech in Uganda for the purposes of informing relief and development programmes by the United Nations. Although internet penetration in Uganda is low, mobile phones are ubiquitous and have made radio a vibrant medium for interactive public discussion. Vulnerable groups make use of radio to discuss issues related to, for example, agriculture,...

chapter

Faster sequence training

Albert Zeyer, Ilia Kulikov, Ralf Schluter, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5285 - 5289

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It has been shown that sequence-discriminative training can improve the performance for large vocabulary continuous speech recognition. Our main contribution is a novel method for reducing the computation time of any sort of sequence training while only slightly decreasing the overall performance. The method allows to parallelize the forward propagation through the network, the loss and loss gradient...

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Training variance and performance evaluation of neural networks in speech

Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2287 - 2291

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work we study variance in the results of neural network training on a wide variety of configurations in automatic speech recognition. Although this variance itself is well known, this is, to the best of our knowledge, the first paper that performs an extensive empirical study on its effects in speech recognition. We view training as sampling from a distribution and show that these distributions...

chapter

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Zhiqiang Lv, Jian Kang, Wei-Qiang Zhang, Jia Liu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5655 - 5659

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence...

chapter

Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

Ranjitha Gurunath Kulkarni, Ahmed El Kholy, Ziad Al Bawab, Noha Alon, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4985 - 4989

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic speech recognition systems can benefit from cues in user voice such as hyperarticulation. Traditional approaches typically attempt to define and detect an absolute state of hyperarticulation, which is very difficult, especially on short voice queries. We present a novel approach for hyperarticulation detection using pairwise comparisons and demonstrate its application in a real-world speech...

chapter

Look, listen, and decode: Multimodal speech recognition with images

Felix Sun, David Harwath, James Glass

2016 IEEE Spoken Language Technology Workshop (SLT) > 573 - 578

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we introduce a multimodal speech recognition scenario, in which an image provides contextual information for a spoken caption to be decoded. We investigate a lattice rescoring algorithm that integrates information from the image at two different points: the image is used to augment the language model with the most likely words, and to rescore the top hypotheses using a word-level RNN...

chapter

Lattice based transcription loss for end-to-end speech recognition

Jian Kang, Wei-Qiang Zhang, Jia Liu

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

End-to-end speech recognition systems have been successfully implemented and have become competitive replacements for hybrid systems. A common loss function to train end-to-end systems is connectionist temporal classification (CTC). This method maximizes the log likelihood between the feature sequence and the associated transcription sequence. However there are some weaknesses with CTC training. The...

chapter

Hybrid context dependent CD-DNN-HMM Keyword Spotting (KWS) in speech conversations

Vivek Tyagi

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

We present detailed analysis of phoneme recognition performance of a context dependent tied-state triphone Gaussian Mixture Model Hidden Markov Model (CD-GMM-HMM) acoustic model (state-of-the-art large acoustic model (AM)) and a four hidden layer context dependent Deep Neural Network (CD-DNN-HMM) AM on the WSJ speech corpus. Using a bigram phoneme language model, phoneme recognition experiments are...

chapter

A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training

Xiaojun Qian, Helen Meng, Frank Soong

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 384 - 387

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents a two-pass framework of mispronunciation detection and diagnosis (MD&D) — detection followed by diagnosis, without the need of explicit error pattern modeling, so that the main efforts can be devoted to improving acoustic modeling by discriminative training (or by applying alternative models like neural nets). The framework instantiates a set of anti-phones and a filler model...

chapter

Quality estimation for asr k-best list rescoring in spoken language translation

Raymond W. M. Ng, Kashif Shah, Wilker Aziz, Lucia Specia, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5226 - 5230

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Spoken language translation (SLT) combines automatic speech recognition (ASR) and machine translation (MT). During the decoding stage, the best hypothesis produced by the ASR system may not be the best input candidate to the MT system, but making use of multiple sub-optimal ASR results in SLT has been shown to be too complex computationally. This paper presents a method to rescore the k-best ASR output...

chapter

Improvements on transducing syllable lattice to word lattice for keyword search

Hang Su, Van Tung Pham, Yanzhang He, James Hieronymus

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4729 - 4733

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates a weighted finite state transducer (WFST) based syllable decoding and transduction method for keyword search (KWS), and compares it with sub-word search and phone confusion methods in detail. Acoustic context dependent phone models are trained from word forced alignments and then used for syllable decoding and lattice generation. Out-of-vocabulary (OOV) keyword pronunciations...

chapter

WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition

Quoc Truong Do, Satoshi Nakamura, Marc Delcroix, Takaaki Hori

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4959 - 4963

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a method to train Weighted Finite State Transducer (WFST) based structural classifiers using deep neural network (DNN) acoustic features and recurrent neural network (RNN) language features for speech recognition. Structural classification is an effective approach to achieve highly accurate recognition of structured data in which the classifier is optimized to maximize the discriminative...

chapter

Towards better keyword search performance on Malay broadcast news data

Haihua Xu, Pham Van Tung, Eng-Siong Chng, Haizhou Li

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 5

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper we describe approaches to building our recent Malay broadcast news audio retrieval system. This system contains speech-to-text and keyword search subsystems. The speech-to-text system is built aiming at two folds: hybrid vocabulary recognition to tackle out-of-vocabulary keyword search issue and diversified acoustic modeling for effective system combination in keyword searching afterwards...

chapter

Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers

Cheung-Chi Leung, Bin Ma, Haizhou Li

2012 8th International Symposium on Chinese Spoken Language Processing > 108 - 111

2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP 2012)

In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models....

chapter

Semi-supervised discriminative language modeling for Turkish ASR

A. Celebi, H. Sak, E. Dikici, M. Saraclar, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5025 - 5028

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a...

chapter

Whole word discriminative point process models

Aren Jansen

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5180 - 5183

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a discriminative extension to whole-word point process modeling techniques. Meant to circumvent the strong independence assumptions of their generative predecessors, discriminative point process models (DPPM) are trained to distinguish the composite temporal patterns of phonetic events produced for a given word from those of its impostors. Using correct and incorrect word hypotheses...

chapter

Lattice-based unsupervised acoustic model training

Thiago Fraga-Silva, Jean-Luc Gauvain, Lori Lamel

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4656 - 4659

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Unsupervised acoustic model training has been successfully used to improve the performance of automatic speech recognition systems when only a small amount of manually transcribed data is available for the target domain. The most common approach is use automatic transcriptions to guide acoustic model estimation. However, since the best recognition hypotheses are known to contain errors, we propose...

chapter

Recent development of discriminative training using non-uniform criteria for cross-level acoustic modeling

Chao Weng, Biing-Hwang Juang

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5332 - 5335

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we extend our previous study on discriminative training using non-uniform criteria for speech recognition. The work will put emphasis on how the acoustic modeling interacts with the risk at a higher level, which is more relevant to the most used evaluation measures, e.g., word error rate(WER). To be specific, the non-uniform error cost is first derived at the word level to minimize...

chapter

Unsupervised domain adaptation with multiple acoustic models

Xin Lei, Wen Wang, Andreas Stolcke

2010 IEEE Spoken Language Technology Workshop > 247 - 252

2010 IEEE Spoken Language Technology Workshop (SLT 2010)

We investigate the problem of adapting a recognition system with multiple acoustic models to a new domain in unsupervised mode. We compare maximum likelihood and discriminative approaches for unsupervised domain adaptation. Different adaptation data selection methods and adaptation strategies are investigated, using a baseline meeting recognition system and adaptation data from a congressional committee...

Keywords:
TRAINING
ACOUSTICS
LATTICES

Publication date

Set your own date range

Keywords

SPEECH RECOGNITION (24)
HIDDEN MARKOV MODELS (16)
SPEECH (11)
ADAPTATION MODEL (6)
DECODING (6)
LEARNING (ARTIFICIAL INTELLIGENCE) (6)
COMPUTATIONAL MODELING (5)
DATA MODELS (5)
DISCRIMINATIVE TRAINING (5)
ACCURACY (4)
FEATURE EXTRACTION (4)
NEURAL NETWORKS (4)
CONTEXT (3)
ERROR ANALYSIS (3)
KEYWORD SEARCH (3)
NATURAL LANGUAGE PROCESSING (3)
SPOKEN TERM DETECTION (3)
WORD ERROR RATE (3)
ACOUSTIC MODEL (2)
ACOUSTIC MODELING (2)
ACOUSTIC SIGNAL PROCESSING (2)
ACTIVE LEARNING (2)
CONFUSION NETWORK (2)
CONNECTIONIST TEMPORAL CLASSIFICATION (2)
ENTROPY (2)
KEYWORD SPOTTING (2)
LATTICE (2)
MAXIMUM LIKELIHOOD ESTIMATION (2)
MULTILAYER PERCEPTRON (2)
MULTILAYER PERCEPTRONS (2)
PROBABILITY (2)
SEMI-SUPERVISED LEARNING (2)
SPEECH PROCESSING (2)
STANDARDS (2)
TRAINING DATA (2)
VOCABULARY (2)
ACCENTED ENGLISH (1)
ACHOLI (1)
ACOUSTIC MODEL PARAMETER (1)
ACOUSTIC MODELING APPROACH (1)
ACOUSTIC MODELLING (1)
ACTIVE LEARNING SAMPLE EVALUATION METHOD (1)
ADAPTATION MODELS (1)
ARABIC (1)
ARABIC SPEECH RECOGNITION (1)
ARTIFICIAL NEURAL NETWORKS (1)
ASR (1)
ASR ACCURACY (1)
AUDIO DATABASES (1)
AUXILIARY TRAINING (1)
BASELINE MEETING RECOGNITION SYSTEM (1)
BAYES METHODS (1)
BILINGUAL COURSE LECTURE (1)
BOOSTED MAXIMUM MUTUAL INFORMATION TRAINING (1)
BOOSTED MMI (1)
BUILDINGS (1)
C++ LANGUAGE (1)
CAMBRIDGE ARABIC SPEECH-TO-TEXT SYSTEMS (1)
CFRN (1)
CHINESE ACOUSTIC MODELING (1)
CHINESE LANGUAGE (1)
CNN (1)
COLLECTIVE INFORMATION (1)
COMPUTATION TIME (1)
COMPUTED TOMOGRAPHY (1)
COMPUTER NUMERICAL CONTROL (1)
CONDITIONAL PROBABILITY TABLE (1)
CONFIDENCE (1)
CONFIDENCE MEASURE (1)
CONFUSION MODELING (1)
CONGRESSIONAL COMMITTEE WEB SITE (1)
CONSORTIUM DEFINED TEST SETS (1)
CONTENT-BASED RETRIEVAL (1)
CONTEXT MODELING (1)
CONVERGENCE (1)
CTC (1)
CURRENT TRANSFORMERS (1)
DARPA GALE PROGRAM (1)
DATA MINING (1)
DEEP NEURAL NETWORK (1)
DEEP NEURAL NETWORKS (1)
DICTIONARIES (1)
DISCRIMINATIVE ADAPTATION (1)
DISCRIMINATIVE APPROACHES (1)
DISTANCE MEASUREMENT (1)
DOMAIN ADAPTATION (1)
DYNAMIC PROGRAMMING (1)
EDIT OPERATION MODELING (1)
END-TO-END SYSTEM (1)
ENTROPY REDUCTION (1)
EQUATIONS (1)
ESTIMATION (1)
FAULT CURRENTS (1)
FRAME RATE NORMALIZATION (1)
GAUSSIAN MIXTURE (1)
GAUSSIAN PROCESSES (1)
GRAPHEMIC SYSTEM (1)
more

INFONA - science communication portal

Search results

Radio-browsing for developmental monitoring in Uganda

Faster sequence training

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Training variance and performance evaluation of neural networks in speech

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

Look, listen, and decode: Multimodal speech recognition with images

Lattice based transcription loss for end-to-end speech recognition

Hybrid context dependent CD-DNN-HMM Keyword Spotting (KWS) in speech conversations

A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training

Quality estimation for asr k-best list rescoring in spoken language translation

Improvements on transducing syllable lattice to word lattice for keyword search

WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition

Towards better keyword search performance on Malay broadcast news data

Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers

Semi-supervised discriminative language modeling for Turkish ASR

Whole word discriminative point process models

Lattice-based unsupervised acoustic model training

Recent development of discriminative training using non-uniform criteria for cross-level acoustic modeling

Unsupervised domain adaptation with multiple acoustic models

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options