Semi-supervised learning of language model using unsupervised topic model

Shuanhu Bai; Chien-Lin Huang; Bin Ma; Haizhou Li

doi:10.1109/ICASSP.2010.5494940

Source

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5386 - 5389

Abstract

We present a semi-supervised learning (SSL) method for building domain-specific language models (LMs) from general-domain data using probabilistic latent semantic analysis (PLSA). The proposed technique first performs topic decomposition (TD) on the combined dataset of domain-specific and general-domain data. Then it derives latent topic distribution of the interested domain, and derives domain-specific word n-gram counts with a PLSA style mixture model. Finally, it uses traditional n-gram modeling to construct domain-specific LMs from the domain-specific word n-gram counts. Experimental results show that this technique outperforms both states-of-the-art relative entropy text selection and traditional supervised training methods.

Identifiers

book ISSN :	1520-6149
book ISBN :	978-1-4244-4295-9
book e-ISBN :	978-1-4244-4296-6
DOI	10.1109/ICASSP.2010.5494940

Keywords

statistical analysis learning (artificial intelligence) natural language processing relative entropy text selection semi-supervised learning language model learning unsupervised topic model domain-specific language models probabilistic latent semantic analysis topic decomposition PLSA style mixture model domain-specific word n-gram counts Adaptation model Data models Training Entropy Smoothing methods Probabilistic logic Vocabulary topic model language model

Additional information

Data set: ieee

Publisher

IEEE

INFONA - science communication portal

Semi-supervised learning of language model using unsupervised topic model

Source

Abstract

Identifiers

Authors

Shuanhu Bai

Chien-Lin Huang

Bin Ma

Haizhou Li

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Semi-supervised learning of language model using unsupervised topic model $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Shuanhu Bai

Chien-Lin Huang

Bin Ma

Haizhou Li

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Semi-supervised learning of language model using unsupervised topic model