Search results

Items from 1 to 20 out of 24 results

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Semi-supervised ensemble DNN acoustic model training

Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5270 - 5274

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is very important to exploit abundant unlabeled speech for improving the acoustic model training in automatic speech recognition (ASR). Semi-supervised training methods incorporate unlabeled data in addition to labeled data to enhance the model training, but it encounters the error-prone label problem. The ensemble training scheme trains a set of models and combines them to make the model more...

chapter

Network architectures for multilingual speech representation learning

Tom Sercu, George Saon, Jia Cui, Xiaodong Cui, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5295 - 5299

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Multilingual (ML) representations play a key role in building speech recognition systems for low resource languages. The IARPA sponsored BABEL program focuses on building speech recognition (ASR) and keyword search (KWS) systems in over 24 languages with limited training data. The most common mechanism to derive ML representations in the BABEL program has been with the use of a two-stage network,...

chapter

Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset

Hamdan Prakoso, Ridi Ferdiana, Rudy Hartanto

2016 International Symposium on Electronics and Smart Devices (ISESD) > 283 - 286

2016 International Symposium on Electronics and Smart Devices (ISESD)

Building Automatic Speech Recognition (ASR) needs acoustic model, language model and dictionary for intended language, which is also applied for Indonesian ASR. In this paper, Indonesian ASR was built using CMUSphinx toolkit (a Hidden Markov Model based ASR tool) with limited dataset. We use digit corpus and own made language model to trained with the limited dataset. We also investigated the implementation...

chapter

Impact of phonetic annotation precision on automatic speech recognition systems

Radek Safarik, Lukas Mateju

2016 39th International Conference on Telecommunications and Signal Processing (TSP) > 311 - 314

2016 39th International Conference on Telecommunications and Signal Processing (TSP)

In this paper we study the impact of phonetic annotation precision on the accuracy of a state-of-the art ASR (automatic speech recognition) system. This issue becomes important especially if we want to port the system to a new language without spending much time by collecting, checking and annotating a large amount of acoustic data in the target language. First, we describe a series of experiments...

chapter

Improved acoustic modeling of low-resource languages using shared SGMM parameters of high-resource languages

Neethu Mariam Joy, Basil Abraham, Navneeth K, S. Umesh

2016 Twenty Second National Conference on Communication (NCC) > 1 - 6

2016 Twenty Second National Conference on Communication (NCC)

In this paper, we investigate methods to improve the recognition performance of low-resource languages with limited training data by borrowing subspace parameters from a high-resource language in subspace Gaussian mixture model (SGMM) framework. As a first step, only the state-specific vectors are updated using low-resource language, while retaining all the globally shared parameters from the high-resource...

chapter

Methods for rapid development of automatic speech recognition system for Russian

Radek Safarik, Jan Nouza

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM) > 1 - 6

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics (ECMSM)

In this paper we present our approach to the rapid and efficient development of an automatic speech recognition (ASR) system for Russian. We try to utilize our tools, procedures and data previously designed and collected for other Slavic languages, Czech and Slovak. We show how we build a large corpus of texts acquired from major publishers' web pages and convert it from Cyrillic to Latin to simplify...

chapter

Deep recurrent regularization neural network for speech recognition

Jen-Tzung Chien, Tsai-Wei Lu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4560 - 4564

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a deep recurrent regularization neural network (DRRNN) for speech recognition. Our idea is to build a regularization neural network acoustic model by conducting the hybrid Tikhonov and weight-decay regularization which compensates the variations due to the input speech as well as the model parameters in the restricted Boltzmann machine as a pre-training stage for feature learning...

chapter

Acoustic model topology optimization using evolutionary methods

Xirimo Bao, Guanglai Gao

The First Asian Conference on Pattern Recognition > 355 - 361

2011 First Asian Conference on Pattern Recognition (ACPR 2011)

Currently, most of the acoustic model selection work is done empirically or heuristically or even arbitrarily. In this paper, Genetic Algorithm (GA) based and Particle Swarm Optimization (PSO) based algorithms that consider the number of states and the kernel numbers for the states simultaneously and reject the uniform allocation of Gaussian kernels are proposed to automatically optimize acoustic...

chapter

Discrimination between healthy subjects and patients with pulmonary emphysema by detection of abnormal respiration

Masaru Yamashita, Shoichi Matsunaga, Sueharu Miyahara

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 693 - 696

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a robust classification strategy for distinguishing between a healthy subject and a patient with pulmonary emphysema on the basis of lung sounds. A symptom of pulmonary emphysema is that almost all lung sounds include some abnormal (i.e., adventitious) sounds. However, the great variety of possible adventitious sounds and noises at auscultation makes high-accuracy detection...

chapter

Robust hands-free Automatic Speech Recognition for human-machine interaction

R Gomez, T Kawahara, K Nakadai

2010 10th IEEE-RAS International Conference on Humanoid Robots > 138 - 143

2010 10th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2010)

In enclosed environments where robots are deployed, the observed speech signal is smeared due to reverberation. This degrades the performance of the automatic speech recognition (ASR). Thus, hands-free speech recognition for human-machine communication is a difficult task. Most speech enhancement techniques used to address this problem enhance the contaminated waveform independent from that of the...

chapter

Automatic transcription of parliamentary meetings and classroom lectures - A sustainable approach and real system evaluations -

Tatsuya Kawahara

2010 7th International Symposium on Chinese Spoken Language Processing > 1 - 6

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

Applications of automatic speech recognition (ASR) have been extended to a variety of tasks and domains, including spontaneous human-human speech. We have developed an ASR system for the Japanese Parliament (Diet), which is deployed this year. By exploiting official records made by human stenographers, we have realized an efficient training scheme of acoustic and language models, which does not require...

chapter

Novel active learning sample evaluation method based on multi-level confusion networks

Wei Chen, Gang Liu, Jun Guo

2010 2nd IEEE InternationalConference on Network Infrastructure and Digital Content > 134 - 139

2010 2nd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2010)

Active Learning (AL) is designed to aid the labor-intensive process of training acoustic model for speech recognition. In AL, only the most informative training samples are selected for manual annotation. Thus, how to evaluate the unlabeled samples is worth researching. In this paper, we propose a unified framework to generate confusion networks of multiple levels including character, syllable and...

chapter

Automatic speaker verification experiments using HMM

Doru-Petru Munteanu, Stefan-Adrian Toma

2010 8th International Conference on Communications > 107 - 110

2010 8th International Conference on Communications (COMM)

This paper addresses the design and implementation of automatic speaker verification (ASV) systems. There is great interest in developing and increasing the performance of ASV applications, taking into account the advantages offered when compared to other biometrical methods. State-of-the-art speaker recognizers are based on statistical models such as GMM, HMM, SVM, ANN or hybrid models. This work...

chapter

Realization of Mandarin continuous digits speech recognition system using Sphinx

Yun Wang, Xueying Zhang

2010 International Symposium on Computer, Communication, Control and Automation (3CA) > 1 > 378 - 380

2010 International Symposium on Computer, Communication, Control and Automation (3CA 2010)

This paper introduces a speech recognition system of Mandarin continuous digits based on Sphinx. The acoustic model of this system is produced by SphinxTrain, and the language model is acquired from the Cmuclmtk statistical language model. In addition, this system makes use of PocketSphinx recognition engine as a decoder. According to the experiment, the correct rate of this system to a sentence of...

chapter

Multi-style MLP features for BN transcription

Viet-Bac Le, Lori Lamel, Jean-Luc Gauvain

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4866 - 4869

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

It has become common practice to adapt acoustic models to specific-conditions (gender, accent, bandwidth) in order to improve the performance of speech-to-text (STT) transcription systems. With the growing interest in the use of discriminative features produced by a multi layer perceptron (MLP) in such systems, the question arise of whether it is necessary to specialize the MLP to particular conditions,...

chapter

Speech modeling based on committee-based active learning

Y Hamanaka, K Shinoda, S Furui, T Emori, more

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4350 - 4353

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

We propose a committee-based active learning method for large vocabulary continuous speech recognition. In this approach, multiple recognizers are prepared beforehand, and the recognition results obtained from them are used for selecting utterances. Here, a progressive search method is used for aligning sentences, and voting entropy is used as a measure for selecting utterances. We apply our method...

chapter

Constrained discriminative training of N-gram language models

A. Rastrow, A. Sethy, B. Ramabhadran

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 311 - 316

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

In this paper, we present a novel version of discriminative training for N-gram language models. Language models impose language specific constraints on the acoustic hypothesis and are crucial in discriminating between competing acoustic hypotheses. As reported in the literature, discriminative training of acoustic models has yielded significant improvements in the performance of a speech recognition...

chapter

A study on hidden Markov model's generalization capability for speech recognition

Xiong Xiao, Jinyu Li, Eng Siong Chng, Haizhou Li, more

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 255 - 260

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

From statistical learning theory, the generalization capability of a model is the ability to generalize well on unseen test data which follow the same distribution as the training data. This paper investigates how generalization capability can also improve robustness when testing and training data are from different distributions in the context of speech recognition. Two discriminative training (DT)...

chapter

Automatic punctuation generation for speech

Wenzhu Shen, R.P. Yu, F. Seide, Ji Wu

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 586 - 589

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

Automatic generation of punctuation is an essential feature for many speech-to-text transcription tasks. This paper describes a maximum a-posteriori (MAP) approach for inserting punctuation marks into raw word sequences obtained from automatic speech recognition (ASR). The system consists of an ??acoustic model?? (AM) for prosodic features (actually pause duration) and a ??language model?? (LM) for...

Keywords:
TRAINING
ACOUSTICS
ACOUSTIC MODEL

Publication date

Set your own date range

Keywords

HIDDEN MARKOV MODELS (19)
SPEECH RECOGNITION (17)
SPEECH (16)
AUTOMATIC SPEECH RECOGNITION (7)
LANGUAGE MODEL (7)
TRAINING DATA (6)
DATA MODELS (5)
NATURAL LANGUAGE PROCESSING (5)
DATABASES (4)
ACCURACY (3)
ACOUSTIC SIGNAL PROCESSING (3)
ADAPTATION MODEL (3)
COMPUTATIONAL MODELING (3)
FEATURE EXTRACTION (3)
HIDDEN MARKOV MODEL (3)
VOCABULARY (3)
ACTIVE LEARNING (2)
ADAPTATION MODELS (2)
DECODING (2)
DEEP NEURAL NETWORK (2)
ENTROPY (2)
ERROR ANALYSIS (2)
LATTICES (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
MAXIMUM LIKELIHOOD ESTIMATION (2)
MULTILAYER PERCEPTRONS (2)
NOISE (2)
PREDICTIVE MODELS (2)
PROBABILITY (2)
STATISTICAL ANALYSIS (2)
TESTING (2)
VOCABULARY CONTINUOUS SPEECH RECOGNITION (2)
ACOUSTIC IMPROVEMENTS (1)
ACOUSTIC MODELING (1)
ACOUSTIC MODELS (1)
ACTIVE LEARNING SAMPLE EVALUATION METHOD (1)
ADVENTITIOUS SOUND (1)
ASR SYSTEM (1)
ASV SYSTEM (1)
AURORA TASK (1)
AURORA-2 TASK (1)
AURORA-3 TASK (1)
AUTOMATIC PUNCTUATION GENERATION (1)
AUTOMATIC SPEAKER VERIFICATION (1)
AUTOMATIC SPEAKER VERIFICATION SYSTEMS (1)
AUTOMATIC TRANSCRIPTION (1)
BACK-OFF PROBABILITY (1)
BANDWIDTH (1)
BIOMETRICAL METHOD (1)
BN TRANSCRIPTION (1)
CAPT (1)
CHINESE ACOUSTIC MODELING (1)
CLASSROOM LECTURE (1)
CLIENT MODEL (1)
CMUCLMTK STATISTICAL LANGUAGE MODEL (1)
CMUSPHINX (1)
COMMITTEE-BASED ACTIVE LEARNING (1)
COMPONENT (1)
COMPUTATIONAL LINGUISTICS (1)
COMPUTER ARCHITECTURE (1)
COMPUTER ASSISTED LANGUAGE LEARNING SYSTEM (1)
COMPUTER BASED TRAINING (1)
COMPUTER-AIDED PRONUNCIATION TRAINING (1)
CONDITION-SPECIFIC ADAPTATION (1)
CONFIDENCE MEASURE (1)
CONFUSION NETWORK (1)
CONGRESS MEETINGS (1)
CONNECTIONIST TEMPORAL CLASSIFICATION (1)
CONSTRAINED DISCRIMINATIVE TRAINING (1)
CONTINUOUS SPEECH RECOGNIZER (1)
CORPUS OF SPONTANEOUS JAPANESE (1)
CORRECTLY-RANK RATE (1)
CROSS-LINGUAL (1)
DATA COLLECTION (1)
DATA PREPARATION (1)
DECISION FUSION (1)
DEEP LEARNING (1)
DIGITAL AUDIO BROADCASTING (1)
DISCRIMINATIVE FEATURE (1)
DISCRIMINATIVE TRAINING METHODS (1)
DNN (1)
EQUATIONS (1)
ERROR STATISTICS (1)
ESTIMATION THEORY (1)
EVOLUTIONARY METHODS (1)
GAUSSIAN MIXTURE MODEL (1)
GENERALISATION (ARTIFICIAL INTELLIGENCE) (1)
GENERALIZATION CAPABILITY (1)
GENETIC ALGORITHM (1)
GENETIC ALGORITHMS (1)
HETEROSCEDASTIC LINEAR DISCRIMINATE ANALYSIS (1)
HMM (1)
HUMAN COMPUTER INTERACTION (1)
HUMAN STENOGRAPHER (1)
HUMAN-MACHINE INTERACTION (1)
HUMAN-MACHINE SCORING CORRELATION COEFFICIENT (1)
HUMAN-MACHINE SCORING DIFFERENCE (1)
more

INFONA - science communication portal

Search results

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Semi-supervised ensemble DNN acoustic model training

Network architectures for multilingual speech representation learning

Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset

Impact of phonetic annotation precision on automatic speech recognition systems

Improved acoustic modeling of low-resource languages using shared SGMM parameters of high-resource languages

Methods for rapid development of automatic speech recognition system for Russian

Deep recurrent regularization neural network for speech recognition

Acoustic model topology optimization using evolutionary methods

Discrimination between healthy subjects and patients with pulmonary emphysema by detection of abnormal respiration

Robust hands-free Automatic Speech Recognition for human-machine interaction

Automatic transcription of parliamentary meetings and classroom lectures - A sustainable approach and real system evaluations -

Novel active learning sample evaluation method based on multi-level confusion networks

Automatic speaker verification experiments using HMM

Realization of Mandarin continuous digits speech recognition system using Sphinx

Multi-style MLP features for BN transcription

Speech modeling based on committee-based active learning

Constrained discriminative training of N-gram language models

A study on hidden Markov model's generalization capability for speech recognition

Automatic punctuation generation for speech

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options