Search results

Items from 1 to 20 out of 43 results

chapter

An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network

Wenbin Ruan, Zhenye Gan, Bin Liu, Yin Guo

2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA) > 303 - 306

2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA)

Deep Neural Networks (DNN) are the dominant technique widely used in English and Chinese speech recognition currently. However, Tibetan speech recognition research starts late and mainly uses Hidden Markov Model (HMM). In this paper, We show a better method of replacing Gaussian Mixture Models (GMM) by DNN to Tibetan Lhasa dialect speech recognition system. The system contains seven layers of features...

chapter

Unsupervised adaptation for deep neural networks using Alternating Direction Method of Multipliers

Roger Hsiao, Tim Ng, Man-Hung Siu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5180 - 5184

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we continue our work on linear least squares based adaptation (LLS) for deep neural networks. We show that our previously proposed algorithm is a special case of an optimization algorithm called Alternating Direction Method of Multipliers (ADMM). We demonstrate that the adaptation algorithm can improve the performance on various deep neural networks including the bidirectional long...

chapter

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 665 - 671

2016 IEEE Spoken Language Technology Workshop (SLT)

Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which...

chapter

Variance reduction for optimization in speech recognition

Jen-Tzung Chien, Pei-Wen Huang

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

Deep neural network (DNN) is trained according to a mini-batch optimization based on the stochastic gradient descent algorithm. Such a stochastic learning suffers from instability in parameter updating and may easily trap into local optimum. This study deals with the stability of stochastic learning by reducing the variance of gradients in optimization procedure. We upgrade the optimization from the...

chapter

On the enhancement of dereverberation algorithms using multiple perceptual-evaluation criteria

Rafael Zambrano-Lopez, Thiago de M. Prego, Amaro A. de Lima, Sergio L. Netto

2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) > 1 - 5

2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)

This paper describes an enhancement strategy based on several perceptual-assessment criteria for dereverberation algorithms. The complete procedure is applied to an algorithm for reverberant speech enhancement based on single-channel blind spectral subtraction. This enhancement was implemented by combining different quality measures, namely the so-called QAreverb, the speech-to-reverberation modulation...

chapter

Nonlinear discriminant analysis with neural networks for speech recognition

Vincent Fontaine, Christophe Ris, Henri Leich

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

Linear Discriminant Analysis (LDA) has been applied successfully to speech recognition tasks, improving accuracy and robustness against some types of noise. However, it is well known that LDA suffers from some weaknesses if the distributions are not unimodal or when the mean of the distributions are shared. In this paper, we propose to take advantage of the nonlinear discriminant properties of the...

chapter

Low-resource keyword search strategies for tamil

Nancy F. Chen, Chongjia Ni, I-Fan Chen, Sunil Sivadas, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5366 - 5370

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization...

chapter

A Gaussian Mixture Model layer jointly optimized with discriminative features within a Deep Neural Network architecture

Ehsan Variani, Erik McDermott, Georg Heigold

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4270 - 4274

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This article proposes and evaluates a Gaussian Mixture Model (GMM) represented as the last layer of a Deep Neural Network (DNN) architecture and jointly optimized with all previous layers using Asynchronous Stochastic Gradient Descent (ASGD). The resulting “Deep GMM” architecture was investigated with special attention to the following issues: (1) The extent to which joint optimization improves over...

chapter

Investigations on sequence training of neural networks

Simon Wiesler, Pavel Golik, Ralf Schluter, Hermann Ney

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4565 - 4569

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an investigation of sequence-discriminative training of deep neural networks for automatic speech recognition. We evaluate different sequence-discriminative training criteria (MMI and MPE) and optimization algorithms (including SGD and Rprop) using the RASR toolkit. Further, we compare the training of the whole network with that of the output layer only. Technical details...

chapter

A dialog management methodology based on evolving Fuzzy-rule-based (FRB) classifiers

David Griol, Jose Antonio Iglesias, Agapito Ledezma, Araceli Sanchis

2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS) > 1 - 8

2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS)

This paper proposes a statistical methodology based on evolving Fuzzy-rule-based (FRB) classifiers to develop dialog managers for spoken dialog systems. The dialog managers developed by means of our proposal select the next system action by considering a set of dynamic rules that are automatically obtained by means of the application of the FRB classification process. Our approach has the main advantage...

chapter

Decoding network optimization using minimum transition error training

Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4197 - 4200

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

The discriminative optimization of decoding networks is important for minimizing speech recognition error. Recently, several methods have been reported that optimize decoding networks by extending weighted finite state transducer (WFST)-based decoding processes to a linear classification process. In this paper, we model decoding processes by using conditional random fields (CRFs). Since the maximum...

chapter

Basis vector orthogonalization for an improved kernel gradient matching pursuit method

Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Simon Wiesler, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1909 - 1912

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

With the aim of achieving a computationally efficient optimization of kernel-based probabilistic models for various problems, such as sequential pattern recognition, we have already developed the kernel gradient matching pursuit method as an approximation technique for kernel-based classification. The conventional kernel gradient matching pursuit method approximates the optimal parameter vector by...

chapter

Optimization in speech-centric information processing: Criteria and techniques

Xiaodong He, Li Deng

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5241 - 5244

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Automatic speech recognition (ASR) is an enabling technology for a wide range of information processing applications including speech translation, voice search (i.e., information retrieval with speech input), and conversational understanding. In these speech-centric applications, the output of ASR as “noisy” text is fed into down-stream processing systems to accomplish the designated tasks of translation,...

chapter

Discriminative approach to lexical entry selection for automatic speech recognition of agglutinative language

Mijit Ablimit, Tatsuya Kawahara, Askar Hamdulla

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5009 - 5012

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In agglutinative languages, selection of lexical unit is not obvious. Morpheme unit is usually adopted to ensure the sufficient coverage, but many morphemes are short, resulting in weak constraints and possible confusions. In this paper, we propose a discriminative approach to select lexical entries which will directly contribute to ASR error reduction. We define an evaluation function for each word...

chapter

A speech recognition system based on fuzzy neural network trained by artificial bee colony algorithm

Aiping Ning, Xueying Zhang

2011 International Conference on Electronics, Communications and Control (ICECC) > 2488 - 2491

2011 International Conference on Electronics, Communications and Control (ICECC)

Training fuzzy neural network (FNN) is an optimization task which is desired to find optimal centers of the membership function and weights. Traditional training algorithms have some drawbacks such as getting stuck in local minima and computational complexity. This work presents FNN trained by artificial bee colony (ABC) optimization algorithm which has good exploration and exploitation capabilities...

chapter

Why word error rate is not a good metric for speech recognizer training for the speech translation task?

Xiaodong He, Li Deng, Alex Acero

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5632 - 5635

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speech translation (ST) is an enabling technology for cross-lingual oral communication. A ST system consists of two major components: an automatic speech recognizer (ASR) and a machine translator (MT). Nowadays, most ASR systems are trained and tuned by minimizing word error rate (WER). However, WER counts word errors at the surface level. It does not consider the contextual and syntactic roles of...

chapter

A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization

Dimitri Kanevsky, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran, more

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5164 - 5167

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We introduce the Line Search A-Function (LSAF) technique that generalizes the Extended-Baum Welch technique in order to provide an effective optimization technique for a broader set of functions. We show how LSAF can be applied to functions of various probability density and distribution functions by demonstrating that these probability functions have an A-function. We also show that sparse representation...

chapter

Subspace pursuit method for kernel-log-linear models

Yotaro Kubo, Simon Wiesler, Ralf Schlueter, Hermann Ney, more

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4500 - 4503

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a novel method for reducing the dimensionality of kernel spaces. Recently, to maintain the convexity of training, log-linear models without mixtures have been used as emission probability density functions in hidden Markov models for automatic speech recognition. In that framework, nonlinearly-transformed high-dimensional features are used to achieve the nonlinear classification...

chapter

Feature selection for log-linear acoustic models

S. Wiesler, A. Richard, Y. Kubo, R. Schluter, more

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5324 - 5327

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Log-linear acoustic models have been shown to be competitive with Gaussian mixture models in speech recognition. Their high training time can be reduced by feature selection. We compare a simple univariate feature selection algorithm with ReliefF - an efficient multivariate algorithm. An alternative to feature selection is ℓ₁-regularized training, which leads to sparse models. We observe that this...

chapter

Structure optimization of metamodels to improve speech recognition accuracy

Santiago Omar Caballero Morales

CONIELECOMP 2011, 21st International Conference on Electrical Communications and Computers > 125 - 130

2011 21st International Conference on Electrical Communications and Computers (CONIELECOMP 2011)

The metamodels is a technique that was developed to model a speaker's phoneme confusion-matrix and use this information to increase speech recognition accuracy for speakers with disordered and normal speech. Approaches to improve the performance of the metamodels, mainly focused on obtaining better estimates of the speaker's confusion-matrix, were studied. While some achieved significant improvements,...

Keywords:
TRAINING
OPTIMIZATION
SPEECH RECOGNITION

Publication date

Set your own date range

Keywords

HIDDEN MARKOV MODELS (29)
SPEECH (23)
ACOUSTICS (9)
ACCURACY (6)
AUTOMATIC SPEECH RECOGNITION (6)
FEATURE EXTRACTION (6)
OPTIMISATION (6)
SPEECH PROCESSING (6)
HIDDEN MARKOV MODEL (5)
ALGORITHM DESIGN AND ANALYSIS (4)
COMPUTATIONAL MODELING (4)
TRANSFORMS (4)
APPROXIMATION METHODS (3)
CONTINUOUS SPEECH RECOGNITION (3)
DATA MODELS (3)
DATABASES (3)
DECODING (3)
EQUATIONS (3)
GENETIC ALGORITHM (3)
GENETIC ALGORITHMS (3)
KERNEL (3)
LATTICES (3)
LEARNING (ARTIFICIAL INTELLIGENCE) (3)
MATHEMATICAL MODEL (3)
MAXIMUM LIKELIHOOD ESTIMATION (3)
MUSIC (3)
NEURAL NETWORKS (3)
NOISE (3)
SUPPORT VECTOR MACHINES (3)
VECTORS (3)
ACOUSTIC MODEL (2)
ADAPTATION MODELS (2)
BAUM-WELCH ALGORITHM (2)
BIOLOGICAL CELLS (2)
CONTEXT (2)
CONVERGENCE (2)
CONVEX PROGRAMMING (2)
DEEP NEURAL NETWORK (2)
DEEP NEURAL NETWORKS (2)
DISCRIMINATIVE TRAINING (2)
EBW ALGORITHM (2)
ERROR ANALYSIS (2)
ESTIMATION (2)
EXTENDED BAUM-WELCH (2)
FEATURE SELECTION (2)
GALLIUM (2)
HMM (2)
JOINTS (2)
LANGUAGE MODEL (2)
LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION (2)
LOG-LINEAR MODEL (2)
MACHINE LEARNING (2)
MEL FREQUENCY CEPSTRAL COEFFICIENT (2)
OPTIMIZATION ALGORITHM (2)
PARAMETER ESTIMATION (2)
PARTICLE SWARM OPTIMIZATION ALGORITHM (2)
PATTERN CLASSIFICATION (2)
RISK ANALYSIS (2)
SEARCH PROBLEMS (2)
SPECTRAL SUBTRACTION (2)
SPEECH TRANSLATION (2)
STOCHASTIC PROCESSES (2)
TRAINING DATA (2)
VITERBI ALGORITHM (2)
VOCABULARY (2)
WAVELET TRANSFORMS (2)
WIENER FILTERS (2)
ℓ<INF>1</INF>-REGULARIZATION (1)
ACCELERATION (1)
ACOUSTIC MODELING (1)
ACOUSTIC SIGNAL PROCESSING (1)
ACTIVE LEARNING (1)
ADAPTIVE FILTERS (1)
ADAPTIVE WIENER FILTER (1)
AGGLUTINATIVE LANGUAGES (1)
ANALYTICAL MODELS (1)
ANNEALING (1)
ARGON (1)
ARTIFICIAL BEE COLONY ALGORITHM (1)
ARTIFICIAL NEURAL NETWORKS (1)
ASR LATTICE OUTPUT (1)
AUTOMATIC SPEECH PROCESSING SYSTEMS (1)
BAUMWELCH ALGORITHM (1)
BAYES METHODS (1)
BAYESIAN METHODS (1)
BIT ALLOCATION (1)
BLEU SCORE OPTIMIZATION (1)
BLIND SOURCE SEPARATION (1)
BRIGHTNESS (1)
CALIBRATION (1)
CHAOS (1)
CHAOS HMM TRAINING (1)
CHAOS OPTIMIZATION (1)
CHINESE VOWELS (1)
CLASSIFICATION (1)
CLASSIFICATION ALGORITHMS (1)
CLASSIFICATION BOUNDARY (1)
more

INFONA - science communication portal

Search results

An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network

Unsupervised adaptation for deep neural networks using Alternating Direction Method of Multipliers

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

Variance reduction for optimization in speech recognition

On the enhancement of dereverberation algorithms using multiple perceptual-evaluation criteria

Nonlinear discriminant analysis with neural networks for speech recognition

Low-resource keyword search strategies for tamil

A Gaussian Mixture Model layer jointly optimized with discriminative features within a Deep Neural Network architecture

Investigations on sequence training of neural networks

A dialog management methodology based on evolving Fuzzy-rule-based (FRB) classifiers

Decoding network optimization using minimum transition error training

Basis vector orthogonalization for an improved kernel gradient matching pursuit method

Optimization in speech-centric information processing: Criteria and techniques

Discriminative approach to lexical entry selection for automatic speech recognition of agglutinative language

A speech recognition system based on fuzzy neural network trained by artificial bee colony algorithm

Why word error rate is not a good metric for speech recognizer training for the speech translation task?

A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization

Subspace pursuit method for kernel-log-linear models

Feature selection for log-linear acoustic models

Structure optimization of metamodels to improve speech recognition accuracy

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options