Search results for: Li Rong

Items from 1 to 12 out of 12 results

chapter

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis

Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4915 - 4919

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a method to extract structural spectral features from spectral envelopes using what-where autoencoders (WWAE) for statistical parametric speech synthesis (SPSS). A WWAE is constructed by concatenating a convolutional net for input encoding and a deconvolutional net for reconstruction. The output values of the max-pooling layer in the encoder and the positions of the max-pooling...

chapter

LIP movement generation using restricted Boltzmann machines for visual speech synthesis

Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 606 - 610

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this...

chapter

Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis

Xin Wang, Zhen-Hua Ling, Li-Rong Dai

2012 8th International Symposium on Chinese Spoken Language Processing > 84 - 87

2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP 2012)

In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous...

chapter

Preserve ordering property of generated LSPS for minimum generation error training in HMM-based speech synthesis

Ming Lei, Zhen-Hua Ling, Li-Rong Dai

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4712 - 4715

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Ordering property is an important property of LSP and closely connected with the naturalness of reconstructed speech. When LSP is adopted as spectrum feature in HMM-based parametric speech synthesis, the ordering property cannot be guaranteed because diagonal covariance matrix is used in conventional system and the cross-dimension correlation of LSP vector is ignored. It will cause unstable issue...

chapter

Minimum generation error training for HMM-based prediction of articulatory movements

Tian-Yi Zhao, Zhen-Hua Ling, Ming Lei, Li-Rong Dai, more

2010 7th International Symposium on Chinese Spoken Language Processing > 99 - 102

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

This paper presents a minimum generation error (MGE) training method for hidden Markov model (HMM) based prediction of articulatory movements when both text and audio inputs are given. In this method, MGE criterion is adopted to replace the maximum likelihood (ML) criterion to estimate model parameters for the unified acoustic-articulatory HMMs. Different from the MGE training for HMM-based acoustic...

chapter

Speaker verification against synthetic speech

Lian-Wu Chen, Wu Guo, Li-Rong Dai

2010 7th International Symposium on Chinese Spoken Language Processing > 309 - 312

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

With the development of the HMM-based parametric speech synthesis algorithm, it is easy for impostors to generate the synthetic speech with specific speaker's characteristics, which is a serious threat to the state of the art speaker verification system. In this paper, we investigate the difference of Mel-cepstral (MCEP) between the natural and synthetic speech. Experiments demonstrate that we can...

chapter

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis

Zhen-Hua Ling, Zhi-Guo Wang, Li-Rong Dai

2010 7th International Symposium on Chinese Spoken Language Processing > 144 - 147

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

In current hidden Markov model(HMM) based unit selection speech synthesis method, the optimal phone-sized candidate units are selected following the maximum likelihood(ML) criterion of the HMMs trained for various acoustic features. This paper introduces the statistical models for syllable-level F0 features into this method. Different from the frame-level F0 parameters used in the current framework,...

chapter

Automatic phrase boundary labeling for Mandarin TTS corpus using context-dependent HMM

Chen-Yu Yang, Zhen-Hua Ling, Heng Lu, Wu Guo, more

2010 7th International Symposium on Chinese Spoken Language Processing > 374 - 377

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

In this paper, an automatic prosodic phrase boundary labeling method for speech synthesis database is presented. This method can be divided into two stages: training stage and labeling stage. In training stage, context-dependent HMM, which is commonly adopted in the HMM-based parametric speech synthesis, is estimated using the training database with manual prosodic labeling. In labeling stage, the...

chapter

Investigation of prosodie FO layers in hierarchical FO modeling for HMM-based speech synthesis

Ming Lei, Yi-Jian Wu, Zhen-Hua Ling, Li-Rong Dai

IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS > 613 - 616

2010 10th International Conference on Signal Processing (ICSP 2010)

To address the overall-micro modeling issue of current prosody model in HMM-based speech synthesis, a hierarchical F0 modeling method has been proposed, in which different kinds of pittch patterns are characterized by different prosodie layers and an minimum generation error (MGE) training framework is used to simultaneous optimize F0 models of all layers. This paper investigate the importance of...

chapter

Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis

Ming Lei, Zhen-Hua Ling, Li-Rong Dai

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4230 - 4233

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper presents a minimum generation error (MGE) training method using weighted Euclidean distance measure on line spectral pairs (LSP) for HMM-based speech synthesis. In this paper, weighted Euclidean distance on LSP is introduced as the measurement of generation error to improve the consistency between the model training criterion and the subjective perception on the distortion of synthetic...

chapter

HMM-based pseudo-clean speech synthesis for splice algorithm

Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4570 - 4573

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

In this paper, we present a novel approach to relax the constraint of stereo-data which is needed in a series of algorithms for noise-robust speech recognition. As a demonstration in SPLICE algorithm, we generate the pseudo-clean features to replace the ideal clean features from one of the stereo channels, by using HMM-based speech synthesis. Experimental results on aurora2 database show that the...

chapter

Full covariance state duration modeling for HMM-based speech synthesis

Heng Lu, Yi-Jian Wu, K. Tokuda, Li-Rong Dai, more

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4033 - 4036

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper proposes a state duration modeling method using full covariance matrix for HMM-based speech synthesis. In this method, a full covariance matrix instead of the conventional diagonal covariance matrix is adopted in the multi-dimensional Gaussian distribution to model the state duration of each context-dependent phoneme. At synthesis stage, the state durations are predicted using the clustered...

Filter options

Keywords:
HIDDEN MARKOV MODEL
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Keywords

TRAINING (11)
SPEECH (10)
ACOUSTICS (6)
FEATURE EXTRACTION (4)
CONTEXT (3)
CONTEXT MODELING (3)
HMM (3)
MINIMUM GENERATION ERROR TRAINING (3)
COVARIANCE MATRIX (2)
HMM-BASED SPEECH SYNTHESIS (2)
MAXIMUM LIKELIHOOD CRITERION (2)
MINIMUM GENERATION ERROR (2)
TRANSFORMS (2)
ACCURACY (1)
ACOUSTIC FEATURE (1)
ACOUSTIC FEATURES (1)
ACOUSTIC SPEECH SYNTHESIS (1)
ADDITIVES (1)
ARTICULATORY FEATURES (1)
ARTICULATORY MOVEMENTS (1)
AUTOMATIC LABELING (1)
AUTOMATIC PHRASE BOUNDARY LABELING (1)
BIAS ADAPTATION ALGORITHM (1)
CLUSTERED CONTEXT-DEPENDENT DISTRIBUTION (1)
COMPLEXITY THEORY (1)
COMPUTATIONAL LINGUISTICS (1)
CONTEXT DEPENDENT HMM (1)
CONTEXT DEPENDENT STATISTICAL MODEL (1)
CONTEXT-DEPENDENT PHONEME (1)
CONTINUOUS F0 MODEL (1)
CONVOLUTION (1)
CONVOLUTION NEURAL NETWORK (1)
COVARIANCE MATRICES (1)
CROSS-STREAM DEPENDENCY (1)
DATA MODELS (1)
DEEP BELIEF NETWORK (1)
DISTORTION (1)
DURATION (1)
DYNAMIC PROGRAMMING (1)
EUCLIDEAN DISTANCE (1)
F0 MODEL (1)
FORMANT BOUNDED WEIGHTING METHOD (1)
FULL COVARIANCE (1)
FULL COVARIANCE MATRIX STATE DURATION MODELING (1)
GAUSSIAN DISTRIBUTION (1)
HIERARCHICAL F0 MODELING (1)
HIERARCHICAL FO MODELING (1)
HMM BASED PREDICTION (1)
HMM BASED SPEECH SYNTHESIS (1)
LABELING (1)
LIKELIHOOD FUNCTION (1)
LINE SPECTRAL PAIRS (1)
LINE SPECTRUM PAIR (1)
LOG SPECTRAL DISTORTION MEASURE (1)
MANDARIN TTS CORPUS (1)
MANUAL PROSODIC LABELING (1)
MATHEMATICAL MODEL (1)
MAXIMUM LIKELIHOOD (1)
MAXIMUM LIKELIHOOD ESTIMATION (1)
MCEP (1)
MEAN SQUARE ERROR METHODS (1)
MGE (1)
MINIMUM GENERATION ERROR TRAINING FRAMEWORK (1)
ML (1)
MULTIDIMENSIONAL GAUSSIAN DISTRIBUTION (1)
NATURAL LANGUAGES (1)
NATURAL SPEECH (1)
NEODYMIUM (1)
NOISE MEASUREMENT (1)
NOISE-ROBUST SPEECH RECOGNITION (1)
NOISY SPEECH RECOGNITION (1)
OPTIMIZATION (1)
ORDERING PROPERTY (1)
PATTERN CLUSTERING (1)
PITCH CONTOUR (1)
PREDICTIVE MODELS (1)
PRINCIPAL COMPONENT ANALYSIS (1)
PROSODIC PHRASE BOUNDARY (1)
PROSODIE F0 LAYERS (1)
PSEUDO-CLEAN SPEECH SYNTHESIS (1)
REGRESSION CLASS (1)
RESTRICTED BOLTZMANN MACHINE (1)
RMS (1)
ROOT MEAN SQUARE (1)
SEARCH ALGORITHM (1)
SEARCH PROBLEMS (1)
SPEAKER RECOGNITION (1)
SPEAKER VERIFICATION (1)
SPECTRAL ENVELOPE (1)
SPEECH RECOGNITION (1)
SPEECH SYNTHESIS DATABASE (1)
SPEECH SYNTHESIS METHOD (1)
SPLICE (1)
SPLICE ALGORITHM (1)
STATISTICAL ANALYSIS (1)
STEREO CHANNEL (1)
SUPRASEGMENTAL PROSODY (1)
more

INFONA - science communication portal

Search results for: Li Rong

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options