Lei Xie

chapter

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5645 - 5649

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages...

chapter

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Zhen Wei, Zhizheng Wu, Lei Xie

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement...

chapter

On the training of DNN-based average voice model for speech synthesis

Shan Yang, Zhizheng Wu, Lei Xie

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Adaptability and controllability are the major advantages of statistical parametric speech synthesis (SPSS) over unit-selection synthesis. Recently, deep neural networks (DNNs) have significantly improved the performance of SPSS. However, current studies are mainly focusing on the training of speaker-dependent DNNs, which generally requires a significant amount of data from a single speaker. In this...

chapter

Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese

Jingyong Hou, Lei Xie, Zhonghua Fu

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

We use query-by-example keyword spotting (QbyE-KWS) approach to solve the personalized wake-up word detection problem for small-footprint, low-computational cost on-device applications. QbyE-KWS takes keywords as templates, and matches the templates across an audio stream via DTW to see if the keyword is included. In this paper, we use neural networks as acoustic models to extract DNN/LSTM phoneme...

chapter

A density peak clustering approach to unsupervised acoustic subword units discovery

Jia Yu, Lei Xie, Xiong Xiao, Eng Siong Chng, more

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 178 - 183

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper studies unsupervised acoustic units discovery from unlabelled speech data. This task is usually approached by two steps, i.e., partitioning speech utterances into segments and clustering these segments into subword categories. In previous approaches, the clustering step usually assumes the number of subword units are known beforehand, which is unreasonable for zero-resource languages. Moreover,...

chapter

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching

Haihua Xu, Peng Yang, Xiong Xiao, Lei Xie, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5191 - 5195

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a partial sequence matching based symbolic search (SS) method for the task of language independent query-by-example spoken term detection. One main drawback of conventional SS approach is the high miss rate for long queries. This is due to high variations in symbol representation of query and search audios, especially in language independent scenario. The successful matching...

chapter

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news

Guangpu Huang, Chenglin Xu, Xiong Xiao, Lei Xie, more

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 9

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents a deep neural network-conditional random field (DNN-CRF) system with multi-view features for sentence unit detection on English broadcast news. We proposed a set of multi-view features extracted from the acoustic, articulatory, and linguistic domains, and used them together in the DNN-CRF model to predict the sentence boundaries. We tested the accuracy of the multi-view features...

chapter

Acoustic TextTiling for story segmentation of spoken documents

Lilei Zheng, Cheung-Chi Leung, Lei Xie, Bin Ma, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5121 - 5124

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure...

chapter

Detection of ball hits in a tennis game using audio and visual information

Qiang Huang, Stephen Cox, Xiangzeng Zhou, Lei Xie

Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 10

2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

In this paper we describe a framework to improve the detection of ball hit events in tennis games by combining audio and visual information. Detection of the presence and timing of these events is crucial for the understanding of the game. However, neither modality on its own gives satisfactory results: audio information is often corrupted by noise and also suffers from acoustic mismatch between the...

INFONA - science communication portal

Search results for: Lei Xie

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

On the training of DNN-based average voice model for speech synthesis

Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese

A density peak clustering approach to unsupervised acoustic subword units discovery

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news

Acoustic TextTiling for story segmentation of spoken documents

Detection of ball hits in a tennis game using audio and visual information

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Lei Xie

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options