Hybrid text mining model for document classification

K A Vidhya; G Aghila

doi:10.1109/ICCAE.2010.5451965

Source

2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE) > 1 > 210 - 214

Abstract

This work proposes a hybrid model for text document classification for information retrieval using Naive Bayes and Rough set theory. Rough set theory is used for feature reduction and Naive Bayes theorem is used for classification of documents into the predefined categories by means of the probabilistic values. The deployment of the proposed model is planned through an enhanced method of the utilization of the Naive Bayes approach and rough set theory to overcome the imprecision and vagueness in data set thus improving the classification accuracy. In Naive Bayes model, the word probabilities for a class are estimated by calculating the likelihood in the entire training documents where the training and test data are split randomly into k-subsets like 2/3 for training and 1/3 for test data. In addition, it also utilizes two level hierarchy structures for training documents like features from title, keywords and content with the predefined knowledge available. The rough set model includes the feature reduction technique through which the number of features for classification is reduced aiming at an optimal classification of text document.

Identifiers

book ISBN :	978-1-4244-5569-0 , 978-1-4244-5585-0
book e-ISBN :	978-1-4244-5586-7
DOI	10.1109/ICCAE.2010.5451965

Keywords

text analysis Bayes methods classification data mining information retrieval probability rough set theory training document hybrid text mining model text document classification naive Bayes feature reduction probabilistic value word probability Classification algorithms Text categorization Training Machine learning algorithms Accuracy Rough sets Approximation methods Text Mining Naïve Bayes Feature Selection

Additional information

Data set: ieee

Publisher

IEEE

INFONA - science communication portal

Hybrid text mining model for document classification

Source

Abstract

Identifiers

Authors

Vidhya, K.A.

Aghila, G.

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Hybrid text mining model for document classification $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Vidhya, K.A.

Aghila, G.

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Hybrid text mining model for document classification