Statistical Identification of Key Phrases for Text Classification

Frans Coenen; Paul Leng; Robert Sanderson; Yanbo J. Wang

doi:10.1007/978-3-540-73499-4_63

Statistical Identification of Key Phrases for Text Classification

Frans Coenen, Paul Leng, Robert Sanderson, Yanbo J. Wang

Source

Lecture Notes in Computer Science > Machine Learning and Data Mining in Pattern Recognition > Text and Document Mining > 838-853

Abstract

Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach.

Identifiers

series ISSN :	0302-9743
series e-ISSN :	1611-3349
book ISBN :	978-3-540-73498-7
book e-ISBN :	978-3-540-73499-4
DOI	10.1007/978-3-540-73499-4_63

Authors

Frans Coenen

Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom

Paul Leng

Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom

Robert Sanderson

Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom

Yanbo J. Wang

Department of Computer Science, The University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, United Kingdom

Keywords

Text Classification Text Preprocessing

Additional information

Data set: Springer

Publisher

Springer Berlin Heidelberg

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Statistical Identification of Key Phrases for Text Classification $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Frans Coenen

Paul Leng

Robert Sanderson

Yanbo J. Wang

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Statistical Identification of Key Phrases for Text Classification