Latent Dirichlet learning for document summarization

Ying-Lang Chang; Jen-Tzung Chien

doi:10.1109/ICASSP.2009.4959927

Source

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 1689 - 1692

Abstract

Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet distributions for latent topics and latent themes in word level and sentence level, respectively. The sentence-based latent Dirichlet allocation (SLDA) is accordingly established for document summarization. Different from the vector space summarization, SLDA is built to fit the fine structure of text documents, and is specifically designed for sentence selection. SLDA acts as a sentence mixture model with a mixture of Dirichlet themes, which are used to generate the latent topics in observed words. The theme model is inherent to distinguish sentences in a summarization system. In the experiments, the proposed SLDA outperforms other methods for document summarization in terms of precision, recall and F-measure.

Identifiers

book ISSN :	1520-6149
book ISBN :	978-1-4244-2353-8
book e-ISBN :	978-1-4244-2354-5
DOI	10.1109/ICASSP.2009.4959927

Keywords

text analysis learning (artificial intelligence) sentence selection automatic text document summarization latent Dirichlet learning hierarchical word representation hierarchical sentence representation sentence-based latent Dirichlet allocation Data mining Resource management Speech Computational modeling Optimization Data models Bayesian methods document summarization latent Dirichlet allocation language model sentence extraction

Additional information

Data set: ieee

Publisher

IEEE

INFONA - science communication portal

Latent Dirichlet learning for document summarization

Source

Abstract

Identifiers

Authors

Ying-Lang Chang

Jen-Tzung Chien

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Latent Dirichlet learning for document summarization $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Ying-Lang Chang

Jen-Tzung Chien

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Latent Dirichlet learning for document summarization