An improvement of flat approach on hierarchical text classification using top-level pruning classifiers

Natchanon Phachongkitphiphat; Peerapon Vateekul

doi:10.1109/JCSSE .2014.6841847

An improvement of flat approach on hierarchical text classification using top-level pruning classifiers

Phachongkitphiphat, Natchanon, Vateekul, Peerapon

Source

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE) > 86 - 90

Abstract

Hierarchical classification has been becoming a popular research topic nowadays, particularly on the web as text categorization. For a large web corpus, there can be a hierarchy with hundreds of thousands of topics, so it is common to handle this task using a flat classification approach, inducing a binary classifier only for the leaf-node classes. However, it always suffers from such low prediction accuracy due to an imbalanced issue in the training data. In this paper, we propose two novel strategies: (i) “Top-Level Pruning” to narrow down the candidate classes, and (ii) “Exclusive Top-Level Training Policy” to build more effective classifiers by utilizing the top-level data. The experiments on the Wikipedia dataset show that our system outperforms the traditional flat approach unanimously on all hierarchical classification metrics.