Search results for: Frank K. Soong

Items from 1 to 4 out of 4 results

article

Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems

Eunwoo Song, Frank K. Soong, Hong-Goo Kang

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2017 > 25 > 11 > 2152 - 2161

In this paper, we report research results on modeling the parameters of an improved time-frequency trajectory excitation (ITFTE) and spectral envelopes of an LPC vocoder with a long short-term memory (LSTM)-based recurrent neural network (RNN) for high-quality text-to-speech (TTS) systems. The ITFTE vocoder has been shown to significantly improve the perceptual quality of statistical parameter-based...

chapter

KL-divergence based mispronunciation detection via DNN and decision tree in the phonetic space

Wenping Hu, Frank K Soong

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose to detect mispronunciations in a language learners speech via a discriminatively trained DNN in the phonetic space. The posterior probabilities of “senones” populated in a decision tree are trained and predicted speaker independently. Acoustic features of each input segment (with preceding and succeeding contexts of several frames) are mapped unto the whole set of senones in their corresponding...

chapter

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis

Yuchen Fan, Yao Qian, Frank K. Soong, Lei He

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4475 - 4479

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In DNN-based TTS synthesis, DNNs hidden layers can be viewed as deep transformation for linguistic features and the output layers as representation of acoustic space to regress the transformed linguistic features to acoustic parameters. The deep-layered architectures of DNN can not only represent highly-complex transformation compactly, but also take advantage of huge amount of training data. In this...

chapter

Synthesizing visual speech trajectory with minimum generation error

Lijuan Wang, Yi-Jian Wu, Xiaodan Zhuang, Frank K. Soong

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4580 - 4583

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a minimum generation error (MGE) training method to refine the audio-visual HMM to improve visual speech trajectory synthesis. Compared with the traditional maximum likelihood (ML) estimation, the proposed MGE training explicitly optimizes the quality of generated visual speech trajectory, where the audio-visual HMM modeling is jointly refined by using a heuristic method...

Filter options

Keywords:
ACOUSTICS

Publication date

Set your own date range

Publication type

book (3)
article (1)

Keywords

SPEECH SYNTHESIS (2)
TRAINING (2)
ADAPTATION MODELS (1)
DATABASES (1)
DECISION TREES (1)
DEEP NEURAL NETWORKS (1)
DISCRETE COSINE TRANSFORMS (1)
DISTORTION (1)
FEATURE EXTRACTION (1)
IMPROVED TIME-FREQUENCY TRAJECTORY EXCITATION VOCODER (1)
LONG SHORT-TERM MEMORY (1)
MINIMUM GENERATION ERROR (1)
MULTI-TASK LEARNING (1)
PHOTO-REAL (1)
PRAGMATICS (1)
RECURRENT NEURAL NETWORK (1)
SPEECH PROCESSING (1)
STATISTICAL PARAMETRIC SPEECH SYNTHESIS (1)
TALKING HEAD (1)
TIME-FREQUENCY ANALYSIS (1)
TRAINING DATA (1)
TRAJECTORY (1)
TRAJECTORY-GUIDED (1)
TRANSFER LEARNING (1)
VISUAL SPEECH SYNTHESIS (1)
VISUALIZATION (1)
VOCODERS (1)
more

INFONA - science communication portal

Search results for: Frank K. Soong

Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems

KL-divergence based mispronunciation detection via DNN and decision tree in the phonetic space

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis

Synthesizing visual speech trajectory with minimum generation error

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options