Search results

Items from 1 to 20 out of 30 results

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

Topic identification of spoken documents using unsupervised acoustic unit discovery

Santosh Kesiraju, Raghavendra Pappagari, Lucas Ondel, Lukas Burget, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5745 - 5749

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the application of unsupervised acoustic unit discovery for topic identification (topic ID) of spoken audio documents. The acoustic unit discovery method is based on a non-parametric Bayesian phone-loop model that segments a speech utterance into phone-like categories. The discovered phone-like (acoustic) units are further fed into the conventional topic ID framework. Using...

chapter

Reducing morpho-phonetic confusion in sub-word based Uyghur ASR

Mijit Ablimit, Askar Hamdulla, Akbar Pattar

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 348 - 352

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

Sub-word units like morphemes are selected as the lexicon for highly inflectional languages, as they can provide better coverage and a smaller vocabulary size. However, short units shrink the context of statistical models, prone to morpho-phonetic changes, and not always outperform the word based model. When sequence of units are merged or split, unit boundaries are phonetically harmonized in the...

chapter

Methods for rapid development of automatic speech recognition system for Russian

Radek Safarik, Jan Nouza

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM) > 1 - 6

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics (ECMSM)

In this paper we present our approach to the rapid and efficient development of an automatic speech recognition (ASR) system for Russian. We try to utilize our tools, procedures and data previously designed and collected for other Slavic languages, Czech and Slovak. We show how we build a large corpus of texts acquired from major publishers' web pages and convert it from Cyrillic to Latin to simplify...

chapter

Integrated pronunciation learning for automatic speech recognition using probabilistic lexical modeling

Ramya Rasipuram, Marzieh Razavi, Mathew Magimai-Doss

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5176 - 5180

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Standard automatic speech recognition (ASR) systems use phoneme-based pronunciation lexicon prepared by linguistic experts. When the hand crafted pronunciations fail to cover the vocabulary of a new domain, a grapheme-to-phoneme (G2P) converter is used to extract pronunciations for new words and then a phonemebased ASR system is trained. G2P converters are typically trained only on the existing lexicons...

chapter

Semi-supervised training in low-resource ASR and KWS

Florian Metze, Ankur Gandhe, Yajie Miao, Zaid Sheikh, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4699 - 4703

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In particular for “low resource” Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech...

chapter

Towards better keyword search performance on Malay broadcast news data

Haihua Xu, Pham Van Tung, Eng-Siong Chng, Haizhou Li

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 5

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper we describe approaches to building our recent Malay broadcast news audio retrieval system. This system contains speech-to-text and keyword search subsystems. The speech-to-text system is built aiming at two folds: hybrid vocabulary recognition to tackle out-of-vocabulary keyword search issue and diversified acoustic modeling for effective system combination in keyword searching afterwards...

chapter

Evaluation of multi-level context-dependent acoustic model for large vocabulary speaker adaptation tasks

Hung-An Chang, James Glass

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4313 - 4316

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we investigate the ability of a recently proposed discriminatively trained, multi-level context-dependent acoustic model to adapt to a new speaker in both supervised and unsupervised adaptation scenarios. Speaker adaptive speech recognition experiments performed on a large-vocabulary spoken lecture task show that the multi-level model reduces word error rates by more than 10% in both...

chapter

Tri-factorization learning of sub-word units with application to vocabulary acquisition

Meng Sun, Hugo Van hamme

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5177 - 5180

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In prior work, we proposed a method for vocabulary acquisition based on a co-occurrence model and non-negative matrix factorization. The vocabulary is described in terms of co-occurrence statistics of frame-level acoustic descriptions and suffers from poor scalability to larger vocabularies. Much like whole-word HMM models, there is no reuse of a sub-word units such as phone models. In this paper,...

chapter

Fast word acquisition in an NMF-based learning framework

Joris Driesen, Hugo Van hamme

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5137 - 5140

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

A speech recognition system that automatically learns word models for a small vocabulary from examples of its usage, without using prior linguistic information, can be of great use in cognitive robotics, human-machine interfaces, and assistive devices. In the latter case, the user's speech capabilities may also be affected. In this paper, we consider a NMF-based learning framework capable of doing...

chapter

Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition

Mijit Ablimit, Askar Hamdulla, Tatsuya Kawahara

2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) > 112 - 115

2011 Oriental COCOSDA 2011 - International Conference on Speech Database and Assessments

For large-vocabulary continuous speech recognition (LVCSR) of highly-inflected languages, selection of an appropriate recognition unit is the first important step. The morpheme-based approach is often adopted because of its high coverage and linguistic properties. But morpheme units are short, often consisting of one or two phonemes, thus they are more likely to be confused in ASR than word units...

chapter

Large vocabulary continuous speech recognition with context-dependent DBN-HMMS

George E. Dahl, Dong Yu, Li Deng, Alex Acero

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4688 - 4691

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The context-independent deep belief network (DBN) hidden Markov model (HMM) hybrid architecture has recently achieved promising results for phone recognition. In this work, we propose a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice...

chapter

Posterior features for template-based ASR

Serena Soldo, Mathew Magimai.-Doss, Joel Pinto, Herve Bourlard

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4864 - 4867

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the use of phoneme class conditional probabilities as features (posterior features) for template-based ASR. Using 75 words and 600 words task-independent and speaker-independent setup on Phonebook database, we investigate the use of different posterior distribution estimators, different distance measures that are better suited for posterior distributions, and different training...

chapter

Pronunciation variation modeling of non-native proper names by discriminative tree search

Line Adde, Torbjorn Svendsen

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4928 - 4931

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, the task of selecting the optimal subset of pronunciation variants from a set of automatically generated candidates is recast as a tree search problem. In this approach, the optimal recognition lexicon corresponds with the optimal path through a search tree. We define a discriminative evaluation function to guide the search algorithm, which is based on estimates of the number of recognition...

chapter

On the use of discriminative and non-discriminative pronunciation priors in pronunciation variation modeling of non-native proper names

L Adde, Torbjørn Svendsen

2010 IEEE Spoken Language Technology Workshop > 229 - 234

2010 IEEE Spoken Language Technology Workshop (SLT 2010)

The large amount of variation present in native speakers' pronunciation of non-native proper names is a big challenge for most automatic speech recognition systems today. The recognizer's ability to handle a variety of different pronunciations is therefore critical to achieve an acceptable recognition performance for this task. This problem has traditionally been solved by including alternative pronunciation...

chapter

Vocabulary and language model adaptation using just one speech file

S Meng, K Thambiratnam, Y Lin, L Wang, more

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5410 - 5413

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper investigates unsupervised vocabulary and language model self-adaptation (VLA) from just one speech file using the web as a knowledge source and without prior knowledge of topic or domain beyond optional file metadata. Single-file self adaptation is regularly used for acoustic adaptation, but to date, is rarely used for VLA. The method investigated here uses a first-pass transcript or file...

chapter

Recent improvements to the Cambridge Arabic Speech-to-Text systems

M Tomalin, F Diehl, M J F Gales, J Park, more

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4382 - 4385

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVSCR) Speech-to-Text (STT) system. It is shown that Multi-Layer Perceptron (MLP) features trained on phonetic targets can improve the performance of both phonemic and graphemic systems. Also, a morphological decomposition scheme is extended from the graphemic domain to the phonetic domain,...

chapter

Continuous space language modeling techniques

R Sarikaya, A Emami, M Afify, B Ramabhadran

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5186 - 5189

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper focuses on comparison of two continuous space language modeling techniques, namely Tied-Mixture Language modeling (TMLM) and Neural Network Based Language Modeling (NNLM). Additionally, we report on using alternative feature representations for words and histories used in TMLM. Besides bigram co-occurrence based features we consider using NNLM based input features for training TMLMs. We...

chapter

Speech modeling based on committee-based active learning

Y Hamanaka, K Shinoda, S Furui, T Emori, more

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4350 - 4353

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

We propose a committee-based active learning method for large vocabulary continuous speech recognition. In this approach, multiple recognizers are prepared beforehand, and the recognition results obtained from them are used for selecting utterances. Here, a progressive search method is used for aligning sentences, and voting entropy is used as a measure for selecting utterances. We apply our method...

chapter

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

L Burget, P Schwarz, M Agarwal, P Akyazi, more

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4334 - 4337

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the...

Keywords:
TRAINING
ACOUSTICS
VOCABULARY

Publication date

Set your own date range

Keywords

SPEECH RECOGNITION (26)
HIDDEN MARKOV MODELS (19)
SPEECH (19)
TRAINING DATA (6)
NATURAL LANGUAGE PROCESSING (5)
ADAPTATION MODEL (4)
ARTIFICIAL NEURAL NETWORKS (4)
AUTOMATIC SPEECH RECOGNITION (4)
DATA MODELS (4)
DATABASES (4)
LANGUAGE MODEL (4)
LVCSR (4)
ACCURACY (3)
ACOUSTIC MODEL (3)
ACOUSTIC SIGNAL PROCESSING (3)
ENTROPY (3)
FEATURE EXTRACTION (3)
HIDDEN MARKOV MODEL (3)
LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION (3)
UNSUPERVISED LEARNING (3)
ACOUSTIC MODELING (2)
ACOUSTIC MODELS (2)
ASR (2)
DATA MINING (2)
DISCRIMINATIVE TRAINING (2)
ERROR ANALYSIS (2)
KEYWORD SEARCH (2)
LANGUAGE MODELS (2)
LARGE VOCABULARY SPEECH RECOGNITION (2)
LATTICES (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
MINIMUM CLASSIFICATION ERROR (2)
MULTILINGUAL ACOUSTIC MODELING (2)
OOV (2)
PROBABILISTIC LOGIC (2)
PRONUNCIATION VARIATION MODELING (2)
SEARCH PROBLEMS (2)
SPEECH PROCESSING (2)
SPOKEN TERM DETECTION (2)
UYGHUR (2)
VECTORS (2)
VOCABULARY ACQUISITION (2)
VOCABULARY CONTINUOUS SPEECH RECOGNITION (2)
ACOUSTIC ADAPTATION (1)
ACOUSTIC CHARACTERISATION (1)
ACOUSTIC CONDITION (1)
ACOUSTIC CONFIDENCE MEASURES (1)
ACOUSTIC SUB-WORD GENERATION (1)
ACOUSTIC TESTING DATA (1)
ACOUSTIC UNIT (1)
ACOUSTIC UNIT DISCOVERY (1)
ACOUSTIC-PHONETIC PROPERTY (1)
ACTIVE LEARNING (1)
ADAPTATION MODELS (1)
ALTERNATIVE PRONUNCIATION VARIANTS (1)
ARABIC (1)
ARBITRARY SPEECH SIGNAL (1)
AUDIO INFORMATION RETREIVAL (1)
AUDIO SIGNAL PROCESSING (1)
AURORA (1)
AUTOMATIC RECOGNITION (1)
AUTOMATIC SPEECH RECOGNITION SYSTEMS (1)
BAYES METHODS (1)
BIGRAM CO-OCCURRENCE (1)
BOOSTED MAXIMUM MUTUAL INFORMATION TRAINING (1)
BOOSTED MMI (1)
BROADCAST NEWS CORPUS (1)
BROADCAST NEWS DATA (1)
BROADCASTING (1)
BUILDINGS (1)
CAMBRIDGE ARABIC SPEECH-TO-TEXT SYSTEMS (1)
CHARACTER ERROR RATE (1)
COMMITTEE-BASED ACTIVE LEARNING (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTATIONAL LINGUISTICS (1)
COMPUTATIONAL MODELING (1)
COMPUTER NUMERICAL CONTROL (1)
CONFIDENCE MEASURE (1)
CONFIDENCE MEASURES (1)
CONTEXT MODELING (1)
CONTEXT-DEPENDENT MODEL (1)
CONTEXT-DEPENDENT PHONE (1)
CONTINUOUS SPACE LANGUAGE MODELING TECHNIQUES (1)
CONTINUOUS SPACE MODELING (1)
CONVERTERS (1)
CONVEX OPTIMIZATION ALGORITHMS (1)
CORPUS OF SPONTANEOUS JAPANESE (1)
CROSS-LINGUAL SPEECH RECOGNITION (1)
DATA COLLECTION (1)
DATA PREPARATION (1)
DBN-HMM (1)
DEEP BELIEF NETWORK (1)
DICTIONARIES (1)
DISCRIMINATIVE TRAINING ALGORITHM (1)
DNN (1)
DNN-HMM (1)
DYSARTHRIC SPEECH (1)
more

INFONA - science communication portal

Search results

Lyric recognition in monophonic singing using pitch-dependent DNN

Topic identification of spoken documents using unsupervised acoustic unit discovery

Reducing morpho-phonetic confusion in sub-word based Uyghur ASR

Methods for rapid development of automatic speech recognition system for Russian

Integrated pronunciation learning for automatic speech recognition using probabilistic lexical modeling

Semi-supervised training in low-resource ASR and KWS

Towards better keyword search performance on Malay broadcast news data

Evaluation of multi-level context-dependent acoustic model for large vocabulary speaker adaptation tasks

Tri-factorization learning of sub-word units with application to vocabulary acquisition

Fast word acquisition in an NMF-based learning framework

Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition

Large vocabulary continuous speech recognition with context-dependent DBN-HMMS

Posterior features for template-based ASR

Pronunciation variation modeling of non-native proper names by discriminative tree search

On the use of discriminative and non-discriminative pronunciation priors in pronunciation variation modeling of non-native proper names

Vocabulary and language model adaptation using just one speech file

Recent improvements to the Cambridge Arabic Speech-to-Text systems

Continuous space language modeling techniques

Speech modeling based on committee-based active learning

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options