Search results for: Yishuang Ning

Items from 1 to 9 out of 9 results

chapter

Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data

Yishuang Ning, Zhiyong Wu, Runnan Li, Jia Jia, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5615 - 5619

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) has achieved state-of-the-art performance in many sequence processing problems given its capability in capturing contextual information. However, for languages with limited amount of training data, it is still difficult to obtain a high quality BLSTM model for emphasis detection, the aim of which is to recognize the emphasized...

chapter

Inferring emotions from heterogeneous social media data: A Cross-media Auto-Encoder solution

Shumei Zhang, Jia Jia, Yishuang Ning

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2891 - 2895

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Social media is rocking the world in recent year, which makes modeling social media contents important. However, the heterogeneity of social media data is the main constraint. This paper focuses on inferring emotions from large-scale social media data. Tweets on social media platform, always containing heterogeneous information from different combinations of modalities, are utilized to construct a...

chapter

DBLSTM-based multi-scale fusion for dynamic emotion prediction in music

Xinxing Li, Jiashen Tian, Mingxing Xu, Yishuang Ning, more

2016 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2016 IEEE International Conference on Multimedia and Expo (ICME)

Dynamic Music Emotion Prediction is crucial to the emerging applications of music retrieval and recommendation. Considering the influence of temporal context and hierarchical structure on emotion in music, we propose a Deep Bidirectional Long Short-Term Memory (DBLSTM) based multi-scale regression method. In this method, a post-processing component is utilised for individual DBSLTM output to further...

chapter

Inferring users' emotions for human-mobile voice dialogue applications

Boya Wu, Jia Jia, Tao He, Juan Du, more

2016 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2016 IEEE International Conference on Multimedia and Expo (ICME)

In this paper, we tackle the problem of inferring users' emotions in real-world Voice Dialogue Applications (VDAs, Siri¹, Cortana², etc.). We first conduct an investigation, indicating that besides the text information of users' queries, the acoustic information and query attributes are very important in inferring emotions in VDAs. To integrate the information above, we propose a Hybrid Emotion Inference...

chapter

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar

Xinyu Lan, Xu Li, Yishuang Ning, Zhiyong Wu, more

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5550 - 5554

ICASSP 2016 - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speech is bimodal in nature. There are close correlations between the acoustic speech signals and the visual gestures such as lip movements, facial expressions and head motions. For speech driven talking avatar, how to derive more representative acoustic features from which to predict more accurate and realistic visual gestures still remains the research problem. Inspired by the promising performance...

chapter

Understanding speaking styles of internet speech data with LSTM and low-resource training

Xixin Wu, Zhiyong Wu, Yishuang Ning, Jia Jia, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 815 - 820

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Speech are widely used to express one's emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data remains a challenging problem. In previous...

chapter

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training

Yishuang Ning, Zhiyong Wu, Jia Jia, Fanbo Meng, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4934 - 4938

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the incorporation of hidden Markov model (HMM) based emphatic speech synthesis for audio exaggeration into an audio-visual speech synthesis framework for the corrective feedback in computer-aided pronunciation training (CAPT). To improve the voice quality of the synthetic emphatic speech, this paper proposes a new method for HMM training. In this method, the contextual questions...

article

Generating emphatic speech with hidden Markov model for expressive speech synthesis

Zhiyong Wu, Yishuang Ning, Xiao Zang, Jia Jia, more

Multimedia Tools and Applications > 2015 > 74 > 22 > 9909-9925

Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. As there are only a few emphasized words in a sentence, the problem of the data limitation is one of the most important problems for emphatic speech synthesis. In this paper, we analyze contrastive (neutral versus emphatic) speech recordings considering kinds...

chapter

Automatic detection of contrastive word pairs using textual and acoustic features

Xiao Zang, Zhiyong Wu, Yishuang Ning, Helen Meng, more

2014 12th International Conference on Signal Processing (ICSP) > 594 - 598

2014 12th International Conference on Signal Processing (ICSP 2014)

Labeling emphatic words from speech recordings plays an important role in building speech corpus for expressive speech synthesis. People generally pronounce some words stronger than usual, making the speech more expressive and signaling the focus of the sentence. Contrastive word pairs are often pronounced with stronger prominences and their presence modifies the meaning of the utterance in subtle...

Filter options

Publication date

Set your own date range

INFONA - science communication portal

Search results for: Yishuang Ning

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options