Search results

Items from 1 to 20 out of 45 results

chapter

A comprehensive study of text classification algorithms

Vikas K Vijayan, K. R. Bindu, Latha Parameswaran

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 1109 - 1113

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...

chapter

Stemming versus multi-words indexing for Arabic documents classification

Mohamed Salim El Bazzi, Taher Zaki, Driss Mammass, Abdelatif Ennaji

2016 11th International Conference on Intelligent Systems: Theories and Applications (SITA) > 1 - 5

2016 11th International Conference on Intelligent Systems: Theories and Applications (SITA)

Documents indexing is the main step in a conventional document classification or information retrieval framework. This study aims to highlight the influence of features' type on the efficiency of a classification system. Empirical results on Arabic dataset reveal that the choice of extracted feature's type has a significant impact on conserving semantic information and improving classification accuracy,...

chapter

A systematic approach to design of a text categorizer

Roger B. Bradford, John Pozniak

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 509 - 514

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

In this paper, we implement a systematic approach to text categorization using latent semantic indexing (LSI). A novel feature of our approach is that we iteratively refine the LSI space used for categorization. Using a verification set, we also employ LSI to determine the values of all parameters controlling the steps of the categorization process. Our approach is designed to scale to enterprise-level...

chapter

A review on feature selection and feature extraction for text classification

Foram P. Shah, Vibha Patel

2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) > 2264 - 2268

2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET)

Day by day the number of text documents in digital form is increasing. Text classification is used to organize these text documents. However, text classification has the problem of high dimensionality of feature space. This high dimensionality of feature space is solved by feature selection and feature extraction methods and improves the performance of text categorization. The feature selection and...

chapter

Table based KNN for categorizing words

Taeho Jo

2016 18th International Conference on Advanced Communication Technology (ICACT) > 692 - 696

2016 18th International Conference on Advanced Communication Technology (ICACT)

In this research, we propose the table based KNN as the approach to the text categorization. In previous works, we discovered that encoding texts into tables improved the performance in the text categorization, so in this research, become to consider the possibility of encoding words into tables as well as texts. In this research, we encode words into tables where entries are texts and their weights,...

chapter

Table based AHC algorithm for clustering words

Taeho Jo

2016 18th International Conference on Advanced Communication Technology (ICACT) > 570 - 575

2016 18th International Conference on Advanced Communication Technology (ICACT)

This research proposes the table based AHC algorithm as the approach to the word clustering task. The results from encoding texts into tables were successful in the previous works on the text categorization and the text clustering, and if oppositely to the case of the text encoding, texts are assumed to be elements of each word, it becomes to be possible to encode words into tables. In this research,...

chapter

Feature extraction for co-occurrence-based cosine similarity score of text documents

Ammar Ismael Kadhim, Yu-N Cheah, Nurul Hashimah Ahamed, Lubab A. Salman

2014 IEEE Student Conference on Research and Development > 1 - 4

2014 IEEE Student Conference on Research and Development (SCOReD)

A major challenge in topic classification (TC) is the high dimensionality of the feature space. Therefore, feature extraction (FE) plays a vital role in topic classification in particular and text mining in general. FE based on cosine similarity score is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which can be impossible to process further...

chapter

News classification based on their headlines: A review

Mazhar Iqbal Rana, Shehzad Khalid, Muhammad Usman Akbar

17th IEEE International Multi Topic Conference 2014 > 211 - 216

2014 IEEE 17th International Multi-Topic Conference (INMIC)

For the last few years, text mining has been gaining significant importance. Since Knowledge is now available to users through variety of sources i.e. electronic media, digital media, print media, and many more. Due to huge availability of text in numerous forms, a lot of unstructured data has been recorded by research experts and have found numerous ways in literature to convert this scattered text...

chapter

A method of pre-sentence text based on Map/Reduce storage and indexing classification

Wu Qing, Yu Yue, Yao Yi, Wu Liang

2014 IEEE 5th International Conference on Software Engineering and Service Science > 195 - 199

2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS)

Today, as more and more businesses and individuals into the study of cloud computing, data storage in the cloud platform is also growing. So how cloud environment quickly and effectively store, manage and use these data has become a very important and challenging issues. This paper mainly discusses the storage model based on Map/Reduce text categorization, at the same time combining forecasting data...

chapter

Effective categorization of text in practical design

S. Ravi, M. Sambath, K. Ramesh Kumar

International Conference on Information Communication and Embedded Systems (ICICES2014) > 1 - 5

2014 International Conference on Information Communication and Embedded Systems (ICICES)

Data mining extracts novel and useful knowledge from large repositories of data and has become an effective analysis and decision means in corporation In many information processing tasks, labels are usually expensive and the unlabeled data points are abundant. To reduce the cost on collecting labels, it is crucial to predict which unlabeled examples are the most informative, i.e., improve the classifier...

chapter

A self appreciating approach of text classifier based on concept mining

K Arul Deepa, C Deisy

2012 International Conference on Computer Communication and Informatics > 1 - 5

2012 International Conference on Computer Communication and Informatics (ICCCI)

A good text classifier is a classifier that efficiently categorizes large sets of text documents in a reasonable time frame and with an acceptable accuracy. Most of the text classification approaches are based on the statistical analysis of a term, either a word or a phrase. Though statistical term analysis shows the importance of the term, it is tedious to analyze when more than one term has the...

chapter

Automatic Text categorization and summarization using rule reduction

C. Lakshmi Devasena, M. Hemalatha

IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012) > 594 - 598

2012 International Conference on Advances in Engineering, Science and Management (ICAESM)

Text mining is a new field that attempts to bring together meaningful information from natural language text. Automatic Text categorization and summarization is the process of assigning pre-defined class labels to incoming, unclassified documents. The class labels are defined based on a set of examples of pre-classified documents used as a training corpus. This research work comprises an automatic...

chapter

Alida, a cognitive approach of text categorization

Yann Vigile Hoareau, Adil El-Ghali

2011 IEEE Workshop on Affective Computational Intelligence (WACI) > 1 - 6

2011 IEEE Workshop on Affective Computational Intelligence - Part of 17273 - 2011 Ssci

This paper proposes a model of text categorization named Alida, which combines a model of categorization inspired of the classical cognitive models of categorization of Nosofsky, with a semantic space model as system of semantic knowledge representation. The model addresses large-scale text categorization applications in opinion mining in different domains and different languages. The performance...

chapter

Classifying Web Pages Using Information Extraction Patterns Preliminary Results and Findings

Lay-Ki Soon, Sang Ho Lee

2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems > 195 - 202

Sixth International Conference on Signal-Image Technology & Internet-Based Systems (SITIS 2010)

Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords, we propose to represent web pages using information extraction patterns that are...

chapter

Improving Arabic document categorization: Introducing local stem

Eiman Tamah Al-Shammari

2010 10th International Conference on Intelligent Systems Design and Applications > 385 - 390

10th International Conference on Intelligent Systems Design and Applications (ISDA 2010)

Stemming is a fundamental step in processing textual data preceding the tasks of text mining, Information Retrieval (IR), and natural language processing (NLP). The common goal of stemming is to standardize words by reducing a word to its base (root or stem), thus can be also considered a feature reduction technique. This paper aims at presenting a new dictionary free, content-based Arabic stemmer...

chapter

Local Feature Selection for Generation of Ensembles in Text Clustering

M N Ribeiro, R B C Prudȇncio

2010 Eleventh Brazilian Symposium on Neural Networks > 67 - 72

2010 Eleventh Brazilian Symposium on Neural Networks (SBRN 2010)

In the context of text clustering, global feature selection tries to identify a single subset of features which are relevant to all clusters. However, the clustering process might be improved by considering different subsets of features for locally describing each cluster. In experiments with local feature selection, it was observed that the resulting partitions were unstable but there were cohesive...

article

Reflective random indexing for semi-automatic indexing of the biomedical literature

Vidya Vasuki, Trevor Cohen

Journal of Biomedical Informatics > 2010 > 43 > 5 > 694-700

The rapid growth of biomedical literature is evident in the increasing size of the MEDLINE research database. Medical Subject Headings (MeSH), a controlled set of keywords, are used to index all the citations contained in the database to facilitate search and retrieval. This volume of citations calls for efficient tools to assist indexers at the US National Library of Medicine (NLM). Currently, the...

chapter

A New Approach for Better Document Retrieval and Classification Performance Using Supervised WSD and Concept Graph

R Soltanpoor, M Mohsenzadeh, M Mohaqeqi

2010 First International Conference on Integrated Intelligent Computing > 32 - 38

2010 First International Conference on Integrated Intelligent Computing (ICIIC 2010)

Word Sense Disambiguation (WSD) is main task in the area of natural language processing (NLP). Supervised WSD methods are shown to be more effective than other WSD methods with the limitation of the size of manual annotated learning set. On the other hand, Concept graph is a weighted graph with each of its edges representing the relationships between concepts (relevancy of each pair of concepts)....

chapter

A two-stage feature selection method for text categorization

Jiana Meng, Hongfei Lin

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1492 - 1496

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Feature selection for text classification is a well-studied problem and the goals are improving classification effectiveness, computational efficiency, or both. In this paper, we propose a two-stage feature selection algorithm based on a kind of feature selection method and latent semantic indexing. Traditional word-matching based text categorization system uses vector space model to represent the...

chapter

Query-relevant document representation for text clustering

Masoud Makrehchi

2010 Fifth International Conference on Digital Information Management (ICDIM) > 132 - 138

2010 Fifth International Conference on Digital Information Management (ICDIM 2010)

In text categorization, one well-known document representation is bag-of-words. Although it is simple and popular, it ignores semantics, underlying linguistic information, and word correlations. In this paper, a new representation for text data is proposed which is called Bag-Of-Queries (BOQ). First, a taxonomy of the terms in the local vocabulary is extracted. Extracting a taxonomy is performed by...

Keywords:
INDEXING

Publication date

Set your own date range

INFONA - science communication portal

Search results

A comprehensive study of text classification algorithms

Stemming versus multi-words indexing for Arabic documents classification

A systematic approach to design of a text categorizer

A review on feature selection and feature extraction for text classification

Table based KNN for categorizing words

Table based AHC algorithm for clustering words

Feature extraction for co-occurrence-based cosine similarity score of text documents

News classification based on their headlines: A review

A method of pre-sentence text based on Map/Reduce storage and indexing classification

Effective categorization of text in practical design

A self appreciating approach of text classifier based on concept mining

Automatic Text categorization and summarization using rule reduction

Alida, a cognitive approach of text categorization

Classifying Web Pages Using Information Extraction Patterns Preliminary Results and Findings

Improving Arabic document categorization: Introducing local stem

Local Feature Selection for Generation of Ensembles in Text Clustering

Reflective random indexing for semi-automatic indexing of the biomedical literature

A New Approach for Better Document Retrieval and Classification Performance Using Supervised WSD and Concept Graph

A two-stage feature selection method for text categorization

Query-relevant document representation for text clustering

Filter options

Publication date

Publication type

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options