Search results for: Dong Yu

Items from 1 to 7 out of 7 results

article

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Morten Kolbaek, Dong Yu, Zheng-Hua Tan, Jesper Jensen

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2017 > 25 > 10 > 1901 - 1913

In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent multitalker speech separation. Specifically, uPIT extends the recently proposed permutation invariant training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an...

chapter

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

Dong Yu, Morten Kolbaek, Zheng-Hua Tan, Jesper Jensen

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 241 - 245

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from the multi-class regression technique and the deep clustering (DPCL) technique, our novel approach minimizes the separation error directly. This strategy effectively solves the long-lasting...

chapter

Prediction-adaptation-correction recurrent neural networks for low-resource language speech recognition

Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5415 - 5419

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition. A PAC-RNN is comprised of a pair of neural networks in which a correction network uses auxiliary information given by a prediction network to help estimate the state probability. The information from the correction network is also used by the prediction...

chapter

Speech recognition with prediction-adaptation-correction recurrent neural networks

Yu Zhang, Dong Yu, Michael L. Seltzer, Jasha Droppo

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5004 - 5008

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose the prediction-adaptation-correction RNN (PAC-RNN), in which a correction DNN estimates the state posterior probability based on both the current frame and the prediction made on the past frames by a prediction DNN. The result from the main DNN is fed back to the prediction DNN to make better predictions for the future frames. In the PAC-RNN, we can consider that, given the new, current...

chapter

Single-channel mixed speech recognition using deep neural networks

Chao Weng, Dong Yu, Michael L. Seltzer, Jasha Droppo

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5632 - 5636

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work, we study the problem of single-channel mixed speech recognition using deep neural networks (DNNs). Using a multi-style training strategy on artificially mixed speech data, we investigate several different training setups that enable the DNN to generalize to corresponding similar patterns in the test data. We also introduce a WFST-based two-talker decoder to work with the trained DNNs...

chapter

Recurrent deep neural networks for robust speech recognition

Chao Weng, Dong Yu, Shinji Watanabe, Biing-Hwang Fred Juang

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5532 - 5536

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work, we propose recurrent deep neural networks (DNNs) for robust automatic speech recognition (ASR). Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations. A new backpropagation through time (BPTT) algorithm is introduced to make the minibatch stochastic gradient descent...

chapter

Scalable stacking and learning for building deep architectures

Li Deng, Dong Yu, John Platt

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2133 - 2136

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Deep Neural Networks (DNNs) have shown remarkable success in pattern recognition tasks. However, parallelizing DNN training across computers has been difficult. We present the Deep Stacking Network (DSN), which overcomes the problem of parallelizing learning algorithms for deep architectures. The DSN provides a method of stacking simple processing modules in buiding deep architectures, with a convex...

Filter options

Keywords:
DNN

Publication date

Set your own date range

Publication type

book (6)
article (1)

Keywords

SPEECH (4)
TRAINING (4)
DEEP LEARNING (3)
CNN (2)
COCKTAIL PARTY PROBLEM (2)
LSTM (2)
MACHINE LEARNING (2)
PAC-RNN (2)
PERMUTATION INVARIANT TRAINING (2)
RECURRENT NEURAL NETWORKS (2)
RNN (2)
SPEECH RECOGNITION (2)
SPEECH SEPARATION (2)
TIME-FREQUENCY ANALYSIS (2)
ACCURACY (1)
ART (1)
AURORA-4 (1)
CHIME (1)
COMPUTATIONAL MODELING (1)
COMPUTER ARCHITECTURE (1)
CONVEXITY (1)
DEEP NEURAL NETWORK (1)
DSN (1)
ERROR ANALYSIS (1)
HIDDEN MARKOV MODELS (1)
IMAGE ANALYSIS (1)
MULTI-TALKER ASR (1)
MULTILINGUAL (1)
PREDICTION-ADAPTATION-CORRECTION RNN (1)
RECURRENT NEURAL NETWORK (1)
ROBUST ASR (1)
SPEECH PROCESSING (1)
STACKING (1)
TUNING (1)
VECTORS (1)
WFST (1)
more

INFONA - science communication portal

Search results for: Dong Yu

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

Prediction-adaptation-correction recurrent neural networks for low-resource language speech recognition

Speech recognition with prediction-adaptation-correction recurrent neural networks

Single-channel mixed speech recognition using deep neural networks

Recurrent deep neural networks for robust speech recognition

Scalable stacking and learning for building deep architectures

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options