Search results for: Daniel Povey

Items from 1 to 20 out of 34 results

chapter

A study on data augmentation of reverberant speech for robust speech recognition

Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5220 - 5224

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real...

chapter

Speaker diarization using deep neural network embeddings

Daniel Garcia-Romero, David Snyder, Gregory Sell, Daniel Povey, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4930 - 4934

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speaker diarization is an important front-end for many speech technologies in the presence of multiple speakers, but current methods that employ i-vector clustering for short segments of speech are potentially too cumbersome and costly for the front-end role. In this work, we propose an alternative approach for learning representations via deep neural networks to remove the i-vector extraction process...

chapter

Deep neural network-based speaker embeddings for end-to-end speaker verification

David Snyder, Pegah Ghahremani, Daniel Povey, Daniel Garcia-Romero, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 165 - 170

2016 IEEE Spoken Language Technology Workshop (SLT)

In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification,...

chapter

Acoustic data-driven pronunciation lexicon generation for logographic languages

Guoguo Chen, Daniel Povey, Sanjeev Khudanpur

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5350 - 5354

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Handcrafted pronunciation lexicons are widely used in modern speech recognition systems. Designing a pronunciation lexicon, however, requires tremendous amount of expert knowledge and effort, which is not practical when applying speech recognition techniques to low resource languages. In this paper, we are interested in developing speech recognition systems for logographic languages with only a small...

chapter

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS

Vijayaditya Peddinti, Guoguo Chen, Vimal Manohar, Tom Ko, more

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 539 - 546

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Multi-style training, using data which emulates a variety of possible test scenarios, is a popular approach towards robust acoustic modeling. However acoustic models capable of exploiting large amounts of training data in a comparatively short amount of training time are essential. In this paper we tackle the problem of reverberant speech recognition using 5500 hours of simulated reverberant data...

chapter

Time delay deep neural network-based universal background models for speaker recognition

David Snyder, Daniel Garcia-Romero, Daniel Povey

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) > 92 - 97

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Recently, deep neural networks (DNN) have been incorporated into i-vector-based speaker recognition systems, where they have significantly improved state-of-the-art performance. In these systems, a DNN is used to collect sufficient statistics for i-vector extraction. In this study, the DNN is a recently developed time delay deep neural network (TDNN) that has achieved promising results in LVCSR tasks...

chapter

Librispeech: An ASR corpus based on public domain audio books

Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5206 - 5210

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language...

chapter

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks

Daniel Garcia-Romero, Xiaohui Zhang, Alan McCree, Daniel Povey

2014 IEEE Spoken Language Technology Workshop (SLT) > 378 - 383

2014 IEEE Spoken Language Technology Workshop (SLT)

Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC).We show that collecting SS with a DNN trained...

chapter

Multilingual deep neural network based acoustic modeling for rapid language adaptation

Ngoc Thang Vu, David Imseng, Daniel Povey, Petr Motlicek, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 7639 - 7643

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a study on multilingual deep neural network (DNN) based acoustic modeling and its application to new languages. We investigate the effect of phone merging on multilingual DNN in context of rapid language adaptation. Moreover, the combination of multilingual DNNs with Kullback-Leibler divergence based acoustic modeling (KL-HMM) is explored. Using ten different languages from the...

chapter

Improving deep neural network acoustic models using generalized maxout networks

Xiaohui Zhang, Jan Trmal, Daniel Povey, Sanjeev Khudanpur

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 215 - 219

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, maxout networks have brought significant improvements to various speech recognition and computer vision tasks. In this paper we introduce two new types of generalized maxout units, which we call p-norm and soft-maxout. We investigate their performance in Large Vocabulary Continuous Speech Recognition (LVCSR) tasks in various languages with 10 hours and 60 hours of data, and find that the...

chapter

Some insights from translating conversational telephone speech

Gaurav Kumar, Matt Post, Daniel Povey, Sanjeev Khudanpur

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3231 - 3235

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We report insights from translating Spanish conversational telephone speech into English text by cascading an automatic speech recognition (ASR) system with a statistical machine translation (SMT) system. The key new insight is that the informal register of conversational speech is a greater challenge for ASR than for SMT: the BLEU score for translating the reference transcript is 64%, but drops to...

chapter

A pitch extraction algorithm tuned for automatic speech recognition

Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, more

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2494 - 2498

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an algorithm that produces pitch and probability-of-voicing estimates for use as features in automatic speech recognition systems. These features give large performance improvements on tonal languages for ASR systems, and even substantial improvements for non-tonal languages. Our method, which we are calling the Kaldi pitch tracker (because we are adding it to the Kaldi ASR...

chapter

Using proxies for OOV keywords in the keyword search task

Guoguo Chen, Oguz Yilmaz, Jan Trmal, Daniel Povey, more

2013 IEEE Workshop on Automatic Speech Recognition and Understanding > 416 - 421

2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

We propose a simple but effective weighted finite state transducer (WFST) based framework for handling out-of-vocabulary (OOV) keywords in a speech search task. State-of-the-art large vocabulary continuous speech recognition (LVCSR) and keyword search (KWS) systems are developed for conversational telephone speech in Tagalog. Word-based and phone-based indexes are created from word lattices, the latter...

chapter

Feature and score level combination of subspace Gaussinas in LVCSR task

Petr Motlicek, Daniel Povey, Martin Karafiat

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 7604 - 7608

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs...

chapter

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages

Guoguo Chen, Sanjeev Khudanpur, Daniel Povey, Jan Trmal, more

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 8560 - 8564

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper quantifies the value of pronunciation lexicons in large vocabulary continuous speech recognition (LVCSR) systems that support keyword search (KWS) in low resource languages. State-of-the-art LVCSR and KWS systems are developed for conversational telephone speech in Tagalog, and the baseline lexicon is augmented via three different grapheme-to-phoneme models that yield increasing coverage...

chapter

Combining forward and backward search in decoding

Mirko Hannemann, Daniel Povey, Geoffrey Zweig

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 6739 - 6743

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We introduce a speed-up for weighted finite state transducer (WFST) based decoders, which is based on the idea that one decoding pass using a wider beam can be replaced by two decoding passes with smaller beams, decoding forward and backward in time. We apply this in a decoder that works with a variable beam width, which is widened in areas where the two decoding passes disagree. Experimental results...

chapter

Modeling gender dependency in the Subspace GMM framework

Ngoc Thang Vu, Tanja Schultz, Daniel Povey

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4345 - 4348

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

The Subspace GMM acoustic model has both globally shared parameters and parameters specific to acoustic states, and this makes it possible to do various kinds of tying. In the past we have investigated sharing the global parameters among systems with distinct acoustic states; this can be useful in a multilingual setting. In the current paper we investigate the reverse idea: to have different global...

chapter

Revisiting Recurrent Neural Networks for robust ASR

Oriol Vinyals, Suman V. Ravuri, Daniel Povey

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4085 - 4088

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we show how new training principles and optimization techniques for neural networks can be used for different network structures. In particular, we revisit the Recurrent Neural Network (RNN), which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM. We apply...

chapter

Generating exact lattices in the WFST framework

Daniel Povey, Mirko Hannemann, Gilles Boulianne, Lukas Burget, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4213 - 4216

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

We describe a lattice generation method that is exact, i.e. it satisfies all the natural properties we would want from a lattice of alternative transcriptions of an utterance. This method does not introduce substantial overhead above one-best decoding. Our method is most directly applicable when using WFST decoders where the WFST is “fully expanded”, i.e. where the arcs correspond to HMM transitions...

article

A basis representation of constrained MLLR transforms for robust adaptation

Daniel Povey, Kaisheng Yao

Computer Speech & Language > 2012 > 26 > 1 > 35-51

Constrained Maximum Likelihood Linear Regression (CMLLR) is a speaker adaptation method for speech recognition that can be realized as a feature-space transformation. In its original form it does not work well when the amount of speech available for adaptation is less than about 5s, because of the difficulty of robustly estimating the parameters of the transformation matrix. In this paper we describe...

Publication date

Set your own date range

INFONA - science communication portal

Search results for: Daniel Povey

A study on data augmentation of reverberant speech for robust speech recognition

Speaker diarization using deep neural network embeddings

Deep neural network-based speaker embeddings for end-to-end speaker verification

Acoustic data-driven pronunciation lexicon generation for logographic languages

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS

Time delay deep neural network-based universal background models for speaker recognition

Librispeech: An ASR corpus based on public domain audio books

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks

Multilingual deep neural network based acoustic modeling for rapid language adaptation

Improving deep neural network acoustic models using generalized maxout networks

Some insights from translating conversational telephone speech

A pitch extraction algorithm tuned for automatic speech recognition

Using proxies for OOV keywords in the keyword search task

Feature and score level combination of subspace Gaussinas in LVCSR task

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages

Combining forward and backward search in decoding

Modeling gender dependency in the Subspace GMM framework

Revisiting Recurrent Neural Networks for robust ASR

Generating exact lattices in the WFST framework

A basis representation of constrained MLLR transforms for robust adaptation

Filter options

Publication date

Publication type

Keywords

Data set

INFONA - science communication portal

Search results for: Daniel Povey

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options