Large scale multi-label text classification of a hierarchical dataset using Rocchio algorithm

Sowmya B J; Chetan; K.G. Srinivasa

doi:10.1109/CSITSS.2016.7779373

Large scale multi-label text classification of a hierarchical dataset using Rocchio algorithm

Source

2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS) > 291 - 296

Abstract

Hierarchical data is becoming increasingly prominent, especially on the web. Wikipedia is one such example where there are millions of documents that are classified into multiple classes in a hierarchical fashion. This gives rise to an interesting problem of automating the classification of new documents. As the size of the dataset grows, so does the number of classes. Further, there seems to be sparsity issue even with the increase in the dataset. Therefore, this poses a challenge to classify data in such a manner. We present two different algorithms based on text categorization: Rocchio algorithm and kNN. We implement and compare the above mentioned methods to better understand the approach to take in classifying hierarchical data.

Identifiers

book ISBN :	978-1-5090-1020-2
book e-ISBN :	978-1-5090-1022-6 , 978-1-5090-1021-9
DOI	10.1109/CSITSS.2016.7779373

Authors

Sowmya B J

Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India

Chetan

Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India

Srinivasa, K.G.

Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India

Keywords

Multilabel classification centroid-based classifier Rocchio Algorithm kNN Algorithm

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Large scale multi-label text classification of a hierarchical dataset using Rocchio algorithm $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Sowmya B J

Chetan

Srinivasa, K.G.

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Large scale multi-label text classification of a hierarchical dataset using Rocchio algorithm