Xiong Xiao

chapter

Beamforming networks using spatial covariance features for far-field speech recognition

Xiong Xiao, Shinji Watanabe, Eng Siong Chng, Haizhou Li

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Recently, a deep beamforming (BF) network was proposed to predict BF weights from phase-carrying features, such as generalized cross correlation (GCC). The BF network is trained jointly with the acoustic model to minimize automatic speech recognition (ASR) cost function. In this paper, we propose to replace GCC with features derived from input signals' spatial covariance matrices (SCM), which contain...

chapter

I-vector based deep neural network acoustic model adaptation using multilingual language resource

Haihua Xu, Wei Rao, Xiong Xiao, Hao Huang, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

I-vector adaptation of DNN-HMM acoustic models has shown clear performance improvement for speech recognition. In this paper, we study this technique on Babel task. we use Swahili as target language (training data of 50 hours) and another 6 languages as multilingual resources to train i-vector extractors respectively. Our study shows that i-vector extractors trained with more multilingual data only...

chapter

Spoofing speech detection using temporal convolutional neural network

Xiaohai Tian, Xiong Xiao, Eng Siong Chng, Haizhou Li

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based...

chapter

Neural networks based channel compensation for i-vector speaker verification

Wei Rao, Xiong Xiao, Chenglin Xu, Haihua Xu, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Linear discriminant analysis (LDA) and Gaussian probabilistic LDA (PLDA) have been shown to effectively suppress channel- and session-variability of i-vectors. But they suffer the following limitations: 1) In LDA, a single linear transformation may not be adequate to describe the nonlinear relationship of features and 2) Gaussian-PLDA assumes the speaker and channel factors follow a Gaussian distribution,...

chapter

A spectrum smoothing method for speaker verification

Zhaofeng Zhang, Jing Deng, Longbiao Wang, Xiong Xiao

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1291 - 1295

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In speech processing, speech signal is usually processed frame by frame due to the non-stationary characteristic of speech. In this paper, a frequency-domain averaging based frame smoothing method is proposed. Besides the conventional frame shift, we introduce a short time shift to create several frames around current frame. Then we take the average of power spectrum for these frames. The average...

chapter

On the study of very low-resource language keyword search

Van Tung Pham, Haihua Xu, Van Hai Do, Tze Yuang Chong, more

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 358 - 364

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper we report our approaches to accomplishing the very limited resource keyword search (KWS) task in the NIST Open Keyword Search 2015 (OpenKWS15) Evaluation. We devised the methods, first, to attain better acoustic modeling, multilingual and semi-supervised acoustic model training as well as the examplar-based acoustic model training; second, to address the overwhelming out-of-vocabulary...

chapter

Detecting synthetic speech using long term magnitude and phase information

Xiaohai Tian, Steven Du, Xiong Xiao, Haihua Xu, more

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 611 - 615

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

Synthetic speech is speech signals generated by text-to-speech (TTS) and voice conversion (VC) techniques. They impose a threat to speaker verification (SV) systems as an attacker may make use of TTS or VC to synthesize a speakers voice to cheat the SV system. To address this challenge, we study the detection of synthetic speech using long term magnitude and phase information of speech. As most of...

chapter

DNN feature compensation for noise robust speaker verification

Steven Du, Xiong Xiao, Eng Siong Chng

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 871 - 875

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

The speaker verification (SV) task has been an active area of research in the last thirty years. One of the recent research topics is on improving the robustness of SV system in challenging environments. This paper examines the robustness of current state of the art SV system against background noise corruptions. Specifically, we consider the scenario where the SV system is trained from noise free...

chapter

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news

Guangpu Huang, Chenglin Xu, Xiong Xiao, Lei Xie, more

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 9

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper presents a deep neural network-conditional random field (DNN-CRF) system with multi-view features for sentence unit detection on English broadcast news. We proposed a set of multi-view features extracted from the acoustic, articulatory, and linguistic domains, and used them together in the DNN-CRF model to predict the sentence boundaries. We tested the accuracy of the multi-view features...

article

Temporal Structure Normalization of Speech Feature for Robust Speech Recognition

Xiong Xiao, Eng Siong Chng, Haizhou Li

IEEE Signal Processing Letters > 2007 > 14 > 7 > 500 - 503

This letter presents a new feature normalization technique to normalize the temporal structure of speech features. The temporal structure of the features is partially represented by its power spectral density (PSD). We observed that the PSD of the features varies with the corrupting noise and signal-to-noise ratio. To reduce the PSD variation due to noise, we propose to normalize the PSD of features...

INFONA - science communication portal

Search results for: Xiong Xiao

Beamforming networks using spatial covariance features for far-field speech recognition

I-vector based deep neural network acoustic model adaptation using multilingual language resource

Spoofing speech detection using temporal convolutional neural network

Neural networks based channel compensation for i-vector speaker verification

A spectrum smoothing method for speaker verification

On the study of very low-resource language keyword search

Detecting synthetic speech using long term magnitude and phase information

DNN feature compensation for noise robust speaker verification

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news

Temporal Structure Normalization of Speech Feature for Robust Speech Recognition

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results for: Xiong Xiao

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options