Search results

Items from 1 to 20 out of 32 results

chapter

The phoneme set influence for lithuanian speech commands recognition accuracy

Mindaugas Greibus, Zivile Ringeliene, Laimutis Telksnys

2017 Open Conference of Electrical, Electronic and Information Sciences (eStream) > 1 - 4

2017 Open Conference of Electrical, Electronic and Information Sciences (eStream)

The phoneme set influence for Lithuanian speech commands recognition accuracy is investigated. Four phoneme sets are discussed. LIEPA speech corpus for training of Acoustic Model is used. The phonetic representation of corpus transcriptions is generated by grapheme-to-phoneme transformation rules. Rule based transformations for Lithuanian language is proposed. Recognition engine with CMU Pocketsphinx...

chapter

Discriminative importance weighting of augmented training data for acoustic model training

Sunit Sivasankaran, Emmanuel Vincent, Irina Illina

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4885 - 4889

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DNN based acoustic models require a large amount of training data. Parametric data augmentation techniques such as adding noise, reverberation, or changing the speech rate, are often employed to boost the dataset size and the ASR performance. The choice of augmentation techniques and the associated parameters has been handled heuristically so far. In this work we propose an algorithm to automatically...

chapter

Towards automatic assessment of aphasia speech using automatic speech recognition techniques

Ying Qin, Tan Lee, Anthony Pak Hin Kong, Sam Po Law

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 4

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Aphasia is a type of acquired language impairment caused by brain injury. This paper presents an automatic speech recognition (ASR) based approach to objective assessment of aphasia patients. A dedicated ASR system is developed to facilitate acoustical and linguistic analysis of Cantonese aphasia speech. The acoustic models and the language models are trained with domain- and style-matched speech...

chapter

Lattice based transcription loss for end-to-end speech recognition

Jian Kang, Wei-Qiang Zhang, Jia Liu

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

End-to-end speech recognition systems have been successfully implemented and have become competitive replacements for hybrid systems. A common loss function to train end-to-end systems is connectionist temporal classification (CTC). This method maximizes the log likelihood between the feature sequence and the associated transcription sequence. However there are some weaknesses with CTC training. The...

chapter

Articulatory and spectrum features integration using generalized distillation framework

Jianguo Yu, Konstantin Markov, Tomoko Matsui

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

It has been shown that by combining the acoustic and articulatory information significant performance improvements in automatic speech recognition (ASR) task can be achieved. In practice, however, articulatory information is not available during recognition and the general approach is to estimate it from the acoustic signal. In this paper, we propose a different approach based on the generalized distillation...

chapter

Application of artificial neural network in Geology: Porosity estimation and lithological facies classification

Suihong Son, Jiagen Hou, Yuming Liu, Sifan Cao, more

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) > 740 - 744

2016 12th International Conference on Natural Computation and 13th Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

Based on the relationship between porosity (or lithological facies) and other petrophysical properties, Artificial neural networks (ANN) are respectively trained for porosity estimation and lithological facies classification, using core porosity (CPOR) data and core lithological facies interpretation results of part of core interval together with some well logs (petrophysical properties). After the...

chapter

Impact of phonetic annotation precision on automatic speech recognition systems

Radek Safarik, Lukas Mateju

2016 39th International Conference on Telecommunications and Signal Processing (TSP) > 311 - 314

2016 39th International Conference on Telecommunications and Signal Processing (TSP)

In this paper we study the impact of phonetic annotation precision on the accuracy of a state-of-the art ASR (automatic speech recognition) system. This issue becomes important especially if we want to port the system to a new language without spending much time by collecting, checking and annotating a large amount of acoustic data in the target language. First, we describe a series of experiments...

chapter

Deep convolutional neural networks for acoustic modeling in low resource languages

William Chan, Ian Lane

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2056 - 2060

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Convolutional Neural Networks (CNNs) have demonstrated powerful acoustic modelling capabilities due to their ability to account for structural locality in the feature space; and in recent works CNNs have been shown to often outperform fully connected Deep Neural Networks (DNNs) on TIMIT and LVCSR. In this paper, we perform a detailed empirical study of CNNs under the low resource condition, wherein...

chapter

Improving multiple-crowd-sourced transcriptions using a speech recogniser

R. C. van Dalen, K. M. Knill, P. Tsiakoulis, M. J. F. Gales

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4709 - 4713

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is...

chapter

Crim's French speech transcription system for ETAPE 2011

Vishwa Gupta, Gilles Boulianne, Frederic Osterrath, Pierre Ouellet

2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA) > 351 - 356

2013 8th InternationalWorkshop on Systems, Signal Processing and their Applications (WoSSPA)

This paper describes the French broadcast speech transcription system by CRIM for the ETAPE 2011 evaluation. The key elements in this recognizer include over 140,000-word dictionary, 478 hours of audio for training the acoustic models, feature-space MMI and boosted MMI discriminative training of the acoustic models, variable-frame-rate decoding with trigram language model, lattice rescoring with quadgram...

chapter

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Denis Jouvet, Dominique Fohr, Irina Illina

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4821 - 4824

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper deals with the evaluation of grapheme-to-phoneme (G2P) converters in a speech recognition context. The precision and recall rates are investigated as potential measures of the quality of the multiple generated pronunciation variants. Very different results are obtained whether or not we take into account the frequency of occurrence of the words. Since G2P systems are rarely evaluated on...

chapter

Combining transcription-based and acoustic-based speaker identifications for broadcast news

Elie El Khoury, Antoine Laurent, Sylvain Meignier, Simon Petitrenaud

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4377 - 4380

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we consider the issue of speaker identification within audio records of broadcast news. The speaker identity information is extracted from both transcript-based and acoustic-based speaker identification systems. This information is combined in the belief functions framework, which makes coherent the knowledge representation of the problem. The Kuhn-Munkres algorithm is used to optimize...

chapter

Affine invariant sparse maximum a posteriori adaptation

Peder A. Olsen, Jing Huang, Steven J. Rennie, Vaibhava Goel

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4317 - 4320

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Modern speech applications utilize acoustic models with billions of parameters, and serve millions of users. Storing an acoustic model for each user is costly. We show through the use of sparse regularization, that it is possible to obtain competitive adaptation performance by changing only a small fraction of the parameters of an acoustic model. This allows for the compression of speaker-dependent...

chapter

Design of multi-feature class models for Speech Recognition Security systems with under-resourced languages

N. Barroso, K. Lopez de Ipina, C. Hernandez, A. Ezeiza

2011 Carnahan Conference on Security Technology > 1 - 6

2011 International Carnahan Conference on Security Technology (ICCST)

One of the goals of Speech Recognition Security (SRS) systems is to have appropriately tools to recognize speech password spoken based on elements such as words, sub-word or speakers. The main goal of the present work is to design robust ASR systems based on alternative ways to the classical evaluation rates, which often depend on the vocabulary of the task and on the language resources available...

chapter

Fast computation of Gaussian likelihoods using low-rank matrix approximations

Mrugesh R. Gajjar, T. V. Sreenivas, R. Govindarajan

2011 IEEE Workshop on Signal Processing Systems (SiPS) > 322 - 327

2011 IEEE Workshop on Signal Processing Systems (SiPS)

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors...

chapter

Development of a Chinese song name recognition system

Shang Cai, Zhen Zhang, Ta Li, Jielin Pan, more

2011 Seventh International Conference on Natural Computation > 2 > 941 - 945

2011 Seventh International Conference on Natural Computation (ICNC)

The development of automatic speech recognition (ASR) technology in recent years has made it possible for some intelligent query systems to use a voice interface. Automatic song selection is a practical and interesting application of ASR. In this paper we describe our efforts to build and improve a Chinese song name recognition system. It is a large vocabulary, speaker-independent system currently...

chapter

PAC-Bayesian approach for minimization of phoneme error rate

Joseph Keshet, David McAllester, Tamir Hazan

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2224 - 2227

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We describe a new approach for phoneme recognition which aims at minimizing the phoneme error rate. Building on structured prediction techniques, we formulate the phoneme recognizer as a linear combination of feature functions. We state a PAC-Bayesian generalization bound, which gives an upper-bound on the expected phoneme error rate in terms of the empirical phoneme error rate. Our algorithm is derived...

chapter

A novel decision function and the associated decision-feedback learning for speech translation

Yaodong Zhang, Li Deng, Xiaodong He, Alex Acero

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5608 - 5611

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we report our recent development of an end-to-end integrative design methodology for speech translation. Specifically, a novel decision function is proposed based on the Bayesian analysis, and the associated discriminative learning technique is presented based on the decision-feedback principle. The decision function in our end-to-end design methodology integrates acoustic scores, language...

chapter

Arccosine kernels: Acoustic modeling with infinite neural networks

Chih-Chieh Cheng, Brian Kingsbury

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5200 - 5203

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Neural networks are a useful alternative to Gaussian mixture models for acoustic modeling; however, training multilayer networks involves a difficult, nonconvex optimization that requires some “art” to make work well in practice. In this paper we investigate the use of arccosine kernels for speech recognition, using these kernels in a hybrid support vector machine/hidden Markov model recognition system...

chapter

Pronunciation variation modeling of non-native proper names by discriminative tree search

Line Adde, Torbjorn Svendsen

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4928 - 4931

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, the task of selecting the optimal subset of pronunciation variants from a set of automatically generated candidates is recast as a tree search problem. In this approach, the optimal recognition lexicon corresponds with the optimal path through a search tree. We define a discriminative evaluation function to guide the search algorithm, which is based on estimates of the number of recognition...

Keywords:
ACOUSTICS
ERROR ANALYSIS

Publication date

Set your own date range

Keywords

SPEECH RECOGNITION (27)
HIDDEN MARKOV MODELS (23)
SPEECH (14)
DISCRIMINATIVE TRAINING (6)
AUTOMATIC SPEECH RECOGNITION (4)
DECODING (4)
NEURAL NETWORKS (4)
ARTIFICIAL NEURAL NETWORKS (3)
DATA MODELS (3)
LATTICES (3)
MATHEMATICAL MODEL (3)
MAXIMUM LIKELIHOOD ESTIMATION (3)
NATURAL LANGUAGE PROCESSING (3)
TRAINING DATA (3)
ACOUSTIC MODEL (2)
ACOUSTIC SIGNAL PROCESSING (2)
ARTIFICIAL NEURAL NETWORK (2)
AUTOMATIC SPEECH RECOGNITION SYSTEMS (2)
BAYESIAN METHODS (2)
COMPUTATIONAL MODELING (2)
DATABASES (2)
GAUSSIAN MIXTURE (2)
KERNEL (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
MAXIMUM ENTROPY (2)
MINIMUM CLASSIFICATION ERROR (2)
NEURAL NETS (2)
PROBABILITY (2)
PRONUNCIATION VARIATION MODELING (2)
VOCABULARY (2)
ACCURACY (1)
ACOUSTIC ARRAYS (1)
ACOUSTIC LIKELIHOOD COMPUTATIONS (1)
ACOUSTIC MODEL TRAINING (1)
ACOUSTIC MODEL TRAINING PARAMETER (1)
ACOUSTIC MONITORING (1)
ACOUSTIC RECORDINGS (1)
ACOUSTIC SIGNAL DETECTION (1)
ACTIVE LEARNING (1)
ACTIVE LEARNING SAMPLE EVALUATION METHOD (1)
ADAPTATION MODELS (1)
ALTERNATIVE PRONUNCIATION VARIANTS (1)
APHASIA SPEECH (1)
APPROXIMATION METHODS (1)
ARRAYS (1)
ARTICULATORY FEATURES (1)
ASR (1)
AUTOMATED DETECTION (1)
AUTOMATIC DETECTION (1)
AUTOMATIC SEGMENTATION (1)
AUTOMATIC TRANSCRIPTION (1)
AUTONOMOUS SIGNAL DETECTION (1)
BAYESIAN PRIOR (1)
BELIEF FUNCTIONS (1)
BENCHMARK EVALUATION (1)
BENCHMARK TESTING (1)
BILINGUAL PHONEME INVENTORY CONSTRUCTION (1)
BILINGUAL SPEECH RECOGNITION (1)
CELLPHONE APPLICATION (1)
CHIME (1)
CHINESE ACOUSTIC MODELING (1)
CLASS-CONDITIONAL PROBABILITY (1)
CLASSIFICATION ALGORITHMS (1)
CLASSIFICATION REGRESSION TREE (1)
CLOSED CAPTIONING (1)
COMBINATORIAL CLASSIFIER (1)
COMBINATORIAL MATHEMATICS (1)
COMPUTER BASED TRAINING (1)
CONFIDENCE WEIGHTED (1)
CONFIDENCE-WEIGHTED LEARNING (1)
CONFUSION NETWORK (1)
CONNECTIONIST TEMPORAL CLASSIFICATION (1)
CONTEXT (1)
CONTINUOUS SPEECH RECOGNITION SYSTEM (1)
CONVERTERS (1)
CONVEX OPTIMIZATION (1)
CONVOLUTION (1)
CONVOLUTIONAL NEURAL NETWORKS (1)
CORPUS BUILDING (1)
COVARIANCE MATRIX (1)
CROWD-SOURCING (1)
DATA AUGMENTATION (1)
DATA DEPENDENT SPARSE FEATURES (1)
DECISION FEEDBACK (1)
DEEP NEURAL NETWORK (1)
DEEP NEURAL NETWORKS (1)
DETECTION PROBABILITY (1)
DISCRIMINATIVE TRAINING ALGORITHM (1)
DISCRIMINATIVE TRAINING TECHNIQUES (1)
DISTANCE MEASUREMENT (1)
DNN (1)
DNN-HMM ACOUSTIC MODEL (1)
DRIVEN DECODING ALGORITHM (1)
ELASTIC NET (1)
EM STYLE AUXILIARY FUNCTION (1)
EMISSION PROBABILITIES (1)
END-TO-END SYSTEM (1)
more

INFONA - science communication portal

Search results

The phoneme set influence for lithuanian speech commands recognition accuracy

Discriminative importance weighting of augmented training data for acoustic model training

Towards automatic assessment of aphasia speech using automatic speech recognition techniques

Lattice based transcription loss for end-to-end speech recognition

Articulatory and spectrum features integration using generalized distillation framework

Application of artificial neural network in Geology: Porosity estimation and lithological facies classification

Impact of phonetic annotation precision on automatic speech recognition systems

Deep convolutional neural networks for acoustic modeling in low resource languages

Improving multiple-crowd-sourced transcriptions using a speech recogniser

Crim's French speech transcription system for ETAPE 2011

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Combining transcription-based and acoustic-based speaker identifications for broadcast news

Affine invariant sparse maximum a posteriori adaptation

Design of multi-feature class models for Speech Recognition Security systems with under-resourced languages

Fast computation of Gaussian likelihoods using low-rank matrix approximations

Development of a Chinese song name recognition system

PAC-Bayesian approach for minimization of phoneme error rate

A novel decision function and the associated decision-feedback learning for speech translation

Arccosine kernels: Acoustic modeling with infinite neural networks

Pronunciation variation modeling of non-native proper names by discriminative tree search

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options