Sibo Tong

chapter

Phone-aware LSTM-RNN for voice conversion

Jiahao Lai, Bo Chen, Tian Tan, Sibo Tong, more

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 177 - 182

2016 IEEE 13th International Conference on Signal Processing (ICSP)

This paper investigates a new voice conversion technique using phone-aware Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs). Most existing voice conversion methods, including Joint Density Gaussian Mixture Models (JDGMMs), Deep Neural Networks (DNNs) and Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs), only take acoustic information of speech as features to...

chapter

Multi-task joint-learning for robust voice activity detection

Yimeng Zhuang, Sibo Tong, Maofan Yin, Yanmin Qian, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Model based VAD approaches have been widely used and achieved success in practice. These approaches usually cast VAD as a frame-level classification problem and employ statistical classifiers, such as Gaussian Mixture Model (GMM) or Deep Neural Network (DNN) to assign a speech/silence label for each frame. Due to the frame independent assumption classification, the VAD results tend to be fragile....

chapter

A comparative study of robustness of deep learning approaches for VAD

Sibo Tong, Hao Gu, Kai Yu

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5695 - 5699

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Voice activity detection (VAD) is an important step for real-world automatic speech recognition (ASR) systems. Deep learning approaches, such as DNN, RNN or CNN, have been widely used in model-based VAD. Although they have achieved success in practice, they are developed on different VAD tasks separately. Whilst VAD performance under noisy conditions, especially with unseen noise or very low SNR,...

chapter

Evaluating vad for automatic speech recognition

Sibo Tong, Nanxin Chen, Yanmin Qian, Kai Yu

2014 12th International Conference on Signal Processing (ICSP) > 2308 - 2314

2014 12th International Conference on Signal Processing (ICSP 2014)

Voice activity detection (VAD) plays a crucial role in speech processing, especially in automatic speech recognition (ASR). It identifies the boundaries of the speech to be recognized and the boundary accuracies may significantly affect the recognition performance. Conventional VAD evaluation criteria are mostly based on frame-level accuracy of speech/non-speech classification, which may result in...

INFONA - science communication portal

Search results for: Sibo Tong

Phone-aware LSTM-RNN for voice conversion

Multi-task joint-learning for robust voice activity detection

A comparative study of robustness of deep learning approaches for VAD

Evaluating vad for automatic speech recognition

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Sibo Tong

Phone-aware LSTM-RNN for voice conversion

Multi-task joint-learning for robust voice activity detection

A comparative study of robustness of deep learning approaches for VAD

Evaluating vad for automatic speech recognition

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options