A novel weighting scheme for efficient document indexing and classification

Bashar Tahayna; Ramesh Kumar Ayyasamy; Saadat Alhashmi; Siew Eu-Gene

doi:10.1109/ITSIM.2010.5561553

Źródło

2010 International Symposium on Information Technology > 2 > 783 - 788

Abstrakt

In this paper we propose and illustrate the effectiveness of a new topic-based document classification method. The proposed method utilizes the Wikipedia, a large scale Web encyclopaedia that has high-quality and huge-scale articles and a category system. Wikipedia is used using an N-gram technique to transform the document from being a “bag of words” to become a “bag of concepts”. Based on this transformation, a novel concept-based weighting scheme (denoted as Conf.idf) is proposed to index the text with the flavor of the traditional tf.idf indexing scheme. Moreover, a genetic algorithm-based support vector machine optimization method is used for the purpose of feature subset and instance selection. Experimental results showed that proposed weighting scheme outperform the traditional indexing and weighting scheme.

Identyfikatory

ISSN książki :	2155--897
e-ISSN książki :	2155-899X
ISBN książki :	978-1-4244-6715-0
e-ISBN książki :	978-1-4244-6718-1 , 978-1-4244-6717-4
DOI	10.1109/ITSIM.2010.5561553

Słowa kluczowe

text analysis genetic algorithms indexing pattern classification support vector machines genetic algorithm-based support vector machine optimization method document indexing topic-based document classification method Wikipedia large scale Web encyclopaedia category system N-gram technique concept-based weighting scheme text indexing Classification algorithms Kernel term weighting scheme feature subset seletion

Informacje dodatkowe

Zbiór danych: ieee

Wydawca

IEEE

INFONA - portal komunikacji naukowej

A novel weighting scheme for efficient document indexing and classification

Źródło

Abstrakt

Identyfikatory

Autorzy

Tahayna, B.

Ayyasamy, R.K.

Alhashmi, S.

Eu-Gene, S.

Słowa kluczowe

Informacje dodatkowe

Wydawca


Przypisz innemu użytkownikowi
	×
Niepoprawny email

INFONA - portal komunikacji naukowej

A novel weighting scheme for efficient document indexing and classification $("#expandableTitles").expandable();

Źródło

Abstrakt

Identyfikatory

Autorzy

Przypisywanie użytkownika

Potwierdzenie anulowania przypisania

Czy jesteś pewien, że chcesz anulować to przypisanie?

Tahayna, B.

Ayyasamy, R.K.

Alhashmi, S.

Eu-Gene, S.

Słowa kluczowe

Informacje dodatkowe

Wydawca

Podziel się

Eksport do bibliografii

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu

A novel weighting scheme for efficient document indexing and classification