Search results

Items from 1 to 20 out of 30 results

chapter

Research review on key techniques of topic-based news elements extraction

Song Qing, Zhang Ying, Zhang Pengzhou

2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) > 585 - 590

2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS)

With the development of computer and network techniques, and the digital Chinese news texts explosion, facing a massive unstructured news data, a better way for knowledge extraction and storage, on the one hand, can help readers understand the core content of news, on the other hand, completed news knowledge accumulation will support the reportage. In recent years, information extraction technology...

chapter

Novel feature selection algorithm for Chinese text categorization based on CHI

Cai Zhenliang, Wang Jian, Liu Jiqiang

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 1035 - 1039

2016 IEEE 13th International Conference on Signal Processing (ICSP)

Chinese text categorization, which is a key technology of massive Chinese text data processing, has been applied to information retrieval, document management, text filtering, etc. However, the categorization accuracy has been the major difficulties faced by the application upgrade. To improve the performance of the Chinese text categorization, feature selection, as an important and indispensable...

chapter

A Text Classifier of English Movie Reviews Based on Information Gain

Lianjing Jin, Wei Gong, Wenlong Fu, Hongbin Wu

2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence > 454 - 457

2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence (ACIT-CSI)

Text classification is the foundation and core of text mining. Naive Bayes is an effective method for text classification. This paper improves the accuracy of Naive Bayes classification using improved information gain, one of methods of feature extraction, by reducing the impact of low-frequency words. In this paper, we use a widely corpus of NLTK. According to the test results, The accuracy of the...

chapter

Characterized by Subjective Clues on Subjective Text Recognition

Yonghua Liu, Aiping Li, Liguo Duan, Hongxiang Wang

2014 International Conference on Cloud Computing and Big Data > 20 - 26

2014 International Conference on Cloud Computing and Big Data (CCBD)

Subjective text recognition is the premise of emotion computation. The current method is the dictionary-based method and statistical-based method, while ignoring the subjective clues which contain rich emotional information and the accuracy is not high. To solve this problem, this paper selects the associated word, the emotional word as well as the indicative verb, the interjection, the degree adverb,...

chapter

A lexicon pool augmented Naive Bayes Classifier for Nepali Text

S. K. Thakur, V. K. Singh

2014 Seventh International Conference on Contemporary Computing (IC3) > 542 - 546

2014 Seventh International Conference on Contemporary Computing (IC3)

This paper presents our experimental work on machine classification of Nepali texts. We have implemented a Naive Bayes classifier for the task and then augmented it through a multinomial lexicon pooling. The lexicon-pooled Naive Bayes Classifier obtains better results on classification task as compared to a normal Naive Bayes implementation. This hybrid approach also helps in dealing with the unavailability...

chapter

Document categorization in multi-agent environment with enhanced machine learning classifier

Seema Singh, Chandra Prakash

2014 Seventh International Conference on Contemporary Computing (IC3) > 589 - 594

2014 Seventh International Conference on Contemporary Computing (IC3)

Text categorization task have gained the attention of researchers in last 10 years with the increase in web-based contents of documents. For searching a particular document from the web or any large document collection text or document categorization is most useful task. We demand some better system and enhanced machine learning classifiers to accomplish task of document categorization. We designed...

chapter

Language identification: A new fast algorithm to identify the language of a text in a multilingual corpus

Said Gadri, Abdelouahab Moussaoui, Linda Belabdelouahab-Fernini

2014 International Conference on Multimedia Computing and Systems (ICMCS) > 321 - 326

2014 International Conference on Multimedia Computing and Systems (ICMCS)

Identifying the language of a text is a very important preliminary phase in the categorization of multilingual documents or even in information retrieval. This phase becomes difficult if we just consider the word as a basic unit of information in texts. Because It could be possible for some languages as French or English but very difficult for some other languages as German, Chinese and Arabic. In...

chapter

An effective method to recognize the language of a text in a collection of multilingual documents

Said Kadri, Abdelouahab Moussaoui

2013 International Conference on Electronics, Computer and Computation (ICECCO) > 208 - 211

2013 International Conference on Electronics, Computer and Computation (ICECCO)

Identifying the language of a text means that we assign this text to a language in which it is written. This identification becomes important because of the increased diversity of textual data in different languages on the web. A real recognition of the text language is not possible if we just consider the word as a basic unit of information. It could be possible in some languages but very difficult...

chapter

A global evaluation criterion for feature selection in text categorization using Kullback-Leibler divergence

Zhilong Zhen, Xiaoqin Zeng, Haijuan Wang, Lixin Han

2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR) > 440 - 445

2011 International Conference of Soft Computing and Pattern Recognition

A major difficulty of text categorization is extremely high dimensionality of text feature space. The use of feature selection techniques for large-scale text categorization task is desired for improving the accuracy and efficiency. χ² statistic and simplified χ² are two effective feature selection methods in text categorization. Using these two feature selection criteria, for a term, one needs to...

chapter

Application of an ant colony algorithm for text indexing

Nadia Lachetar, Halima Bahi

2011 International Conference on Multimedia Computing and Systems > 1 - 6

2011 International Conference on Multimedia Computing and Systems (ICMCS)

Every day, the mass of information available to us increases. This information would be irrelevant if our ability to efficiently access did not increase as well. For maximum benefit, we need tools that allow us to search, sort, index, store, and analyze the available data. We also need tools helping us to find in a reasonable time the desired information by performing certain tasks for us. One of...

chapter

A Multiclass SVM Method via Probabilistic Error-Correcting Output Codes

Zhanyi Wang, Weiran Xu, Jiani Hu, Jun Guo

2010 International Conference on Internet Technology and Applications > 1 - 4

2010 International Conference on Internet Technology and Applications (iTAP 2010)

Error-correcting output code (ECOC) is an effective approach to solve the problem of multiclass SVM. In this paper, a probabilistic approach that is based on ECOC is proposed. In the training stage, a coding scheme is predefined, and a special model is trained by samples. In the classification stage, besides the labels from SVM as usual, posterior probabilities of labels are also calculated. They...

chapter

Research on Short Text Classification Algorithm Based on Statistics and Rules

Zhou Faguo, Zhang Fan, Yang Bingru, Yu Xingang

2010 Third International Symposium on Electronic Commerce and Security > 3 - 7

2010 Third International Symposium on Electronic Commerce and Security (ISECS 2010)

In this paper, we introduced the overview of short text research and the short text classification firstly. On the foundation of several common used classic text classification algorithms, mainly according to the major feature extraction methods, the short text classification based on statistics and rules is proposed. Experiments show that this algorithm has better performance than other algorithms...

chapter

Text Mining and the Future Exploration

Cao Lijun, Cui Yong, Yang Yanping, Liu Xiyin

2010 International Conference on Communications and Mobile Computing > 1 > 237 - 241

2010 International Conference on Communications and Mobile Computing (CMC 2010)

On the basis of analyzing the basic concepts and the process of text excavation, the present study proposes some new methods in extraction of text features, deflation of characteristic collection, extraction of study and knowledge pattern, and appraisal of model quality. Meanwhile, it makes a comparison of two types of text categorization, text classifications and text cluster, and it briefly explores...

chapter

A New Method of Training Sample Selection in Text Classification

Yixing Liao, Xuezeng Pan

2010 Second International Workshop on Education Technology and Computer Science > 1 > 211 - 214

2010 2nd International Workshop on Education Technology and Computer Science (ETCS)

Aiming to noise samples in the training dataset, a new method for reducing the amount of training dataset is proposed in the paper which is applicable to text classification. This method describes the distribution of training dataset according to the representativeness score of samples in the class they belong to, so as to show representative samples and noise samples in each class. The new method...

chapter

Hybrid text mining model for document classification

K A Vidhya, G Aghila

2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE) > 1 > 210 - 214

2nd International Conference on Computer and Automation Engineering (ICCAE 2010)

This work proposes a hybrid model for text document classification for information retrieval using Naive Bayes and Rough set theory. Rough set theory is used for feature reduction and Naive Bayes theorem is used for classification of documents into the predefined categories by means of the probabilistic values. The deployment of the proposed model is planned through an enhanced method of the utilization...

chapter

A Term Weighting Approach with Subjective Logic Reasoning for Text Categorization

Qingzheng Jiao, Chengjian Wei

2009 International Conference on Computational Intelligence and Software Engineering > 1 - 5

2009 International Conference on Computational Intelligence and Software Engineering

A new term weighting approach is used to construct the simplest linear weighting classifier (SL). By probability standard deviation of terms as a base line weighting regulated with terms distributed parameters based on subjective logic reasoning, the weighting is computed. In the assessment process of terms distributed parameters, the model of the term reputation in documents categories based on Beta...

chapter

Large-Scale Hierarchical Text Classification Based on Path Semantic Vector and Prior Information

Feng Gao, Weiming Fu, Yiping Zhong, Danfeng Zhao

2009 International Conference on Computational Intelligence and Security > 1 > 54 - 58

2009 International Conference on Computational Intelligence and Security (CIS 2009)

Although an improvement of hierarchical text classification can be achieved by using hierarchical structure information, existing hierarchical text classification methods suffer from two problems: data skew (especially in large-scale hierarchy) and error propagation. In this paper, we first define the concept of path-based semantic vector for the presentation of categories. Then a set of additional...

chapter

Combining Multiple Feature Selection Methods for Text Categorization by Using Rank-Score Characteristics

Yanjun Li, D.F. Hsu, S.M. Chung

2009 21st IEEE International Conference on Tools with Artificial Intelligence > 508 - 517

2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2009)

Feature selection is an important method for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus.Extensive researches have been done to improve the performance of individual feature selection methods, but not much on their combinations.In this paper, we propose a method of combining multiple feature selection methods by...

chapter

The Application Research of Topic Word List In Text Automatic Classification

Huan Huang, Qingtang Liu, Linjing Wu, Tao Huang, more

2009 Second International Symposium on Knowledge Acquisition and Modeling > 2 > 111 - 114

2009 Second International Symposium on Knowledge Acquisition and Modeling (KAM 2009)

When the traditional text classification technologies classify academic dissertations, the dimension of extracted feature terms is high, and they can't represent the theme of thesis. it makes the efficiency is very low and the accuracy rate is not high. The topic words are small in quantity and can reflect the theme of thesis well. Accordingly, the paper proposes to extract the topic words with topic...

chapter

The Optimization of Threshold-Based Naive Bayesian Algorithm

Wang Xin, Jiang Hua

2009 Third International Conference on Genetic and Evolutionary Computing > 762 - 764

2009 Third International Conference on Genetic and Evolutionary Computing (WGEC 2009)

In order to realize the text classification and spam filtering, the Naive Bayesian algorithm estimate what class are the text in by basing on some statistical probability values in accordance with the characteristic in straining sample, but it is easy to expose the overflow problem, this article will optimize the algorithm by setting the threshold, the optimization strategy is comparing the times...

Keywords:
PROBABILITY
Publication type:
book

Publication date

Set your own date range

Keywords

TRAINING (21)
CLASSIFICATION ALGORITHMS (19)
TEXT ANALYSIS (19)
ACCURACY (9)
FEATURE EXTRACTION (9)
MACHINE LEARNING (9)
SUPPORT VECTOR MACHINES (8)
TEXT CLASSIFICATION (8)
DATA MINING (7)
ALGORITHM DESIGN AND ANALYSIS (6)
CLASSIFICATION (6)
PATTERN CLASSIFICATION (5)
TEXT MINING (5)
BAYES METHODS (4)
DISTANCE MEASUREMENT (4)
FEATURE SELECTION (4)
INFORMATION RETRIEVAL (4)
LEARNING (ARTIFICIAL INTELLIGENCE) (4)
NAIVE BAYES (4)
NATURAL LANGUAGE PROCESSING (4)
SUPPORT VECTOR MACHINE (4)
SUPPORT VECTOR MACHINE CLASSIFICATION (4)
TEXT RECOGNITION (4)
COMPUTERS (3)
GRAPH THEORY (3)
MATHEMATICAL MODEL (3)
PRAGMATICS (3)
VOCABULARY (3)
BAYES (2)
CLUSTERING ALGORITHMS (2)
EDUCATIONAL INSTITUTIONS (2)
ENCODING (2)
INCREMENTAL LEARNING (2)
INFORMATION FILTERING (2)
INTERNET (2)
LANGUAGE IDENTIFICATION (2)
NOISE (2)
PATTERN CLUSTERING (2)
POSTERIOR PROBABILITY (2)
SEMANTICS (2)
TRAINING DATA (2)
WORD PROCESSING (2)
ACADEMIC DISSERTATIONS (1)
ACCUMULATED PROBABILITY VALUES (1)
AFFINITY PROPAGATION (1)
AGENT (1)
ANT COLONY ALGORITHM (1)
APERY ALGORITHM (1)
APPROXIMATION METHODS (1)
ASSOCIATION RULE MINING (1)
AUTOMATIC TEXT CATEGORIZATION (1)
AUTOMATIC TEXT SUMMARIZATION (1)
BAG OF WORDS (1)
BAYES CLASSIFICATION (1)
BAYES FORMULA (1)
BAYESIAN CLASSIFICATION METHOD (1)
BAYESIAN METHODS (1)
BAYESIAN TEXT CLASSIFICATION METHODS (1)
BETA PROBABILITY DENSITY FUNCTION (1)
CHARACTERISTIC COLLECTION (1)
CHARACTERISTIC COLLECTION DEFLATION (1)
CHI (1)
CHI-SQUARE STATISTIC (1)
CHINESE TEXT CATEGORIZATION (1)
CHINESE TEXT CLASSIFICATION ALGORITHM (1)
CLASS-CONDITIONAL PROBABILITY DISTRIBUTION (1)
CLUSTERING TECHNIQUE (1)
COGNITION (1)
COMBINATORIAL FUSION ANALYSIS (1)
COMBINATORIAL FUSION ANALYSIS (CFA) (1)
COMPUTATIONAL MODELING (1)
CONNECTIVE STRENGTH (1)
CONVENTIONAL INCREMENTAL LEARNING ALGORITHM (1)
COOCCURRENCE PROBABILITY (1)
DATA SKEW (1)
DATA SPARSE CATEGORIES (1)
DEPENDENCY PARSING (1)
DICTIONARIES (1)
DIGIT CLASSIFICATION (1)
DOCUMENT IMAGE PROCESSING (1)
DOCUMENT VECTOR (1)
DOCUMENTS CATEGORIES (1)
EDUCATIONAL TECHNOLOGY (1)
ELEMENTS EXTRACTION (1)
EMOTION COMPUTATION (1)
ENTROPY (1)
EQUATIONS (1)
ERROR ANALYSIS (1)
ERROR CORRECTION CODES (1)
ERROR PROPAGATION (1)
ERROR PROPAGATION REDUCTION (1)
EVENT EXTRACTION (1)
EXPECTATION-MAXIMISATION ALGORITHM (1)
FEATURE EXTRACTION METHODS (1)
FEATURE REDUCTION (1)
FEATURE TERMS EXTRACTION (1)
FEEDBACK (1)
FILTERING (1)
more

INFONA - science communication portal

Search results

Research review on key techniques of topic-based news elements extraction

Novel feature selection algorithm for Chinese text categorization based on CHI

A Text Classifier of English Movie Reviews Based on Information Gain

Characterized by Subjective Clues on Subjective Text Recognition

A lexicon pool augmented Naive Bayes Classifier for Nepali Text

Document categorization in multi-agent environment with enhanced machine learning classifier

Language identification: A new fast algorithm to identify the language of a text in a multilingual corpus

An effective method to recognize the language of a text in a collection of multilingual documents

A global evaluation criterion for feature selection in text categorization using Kullback-Leibler divergence

Application of an ant colony algorithm for text indexing

A Multiclass SVM Method via Probabilistic Error-Correcting Output Codes

Research on Short Text Classification Algorithm Based on Statistics and Rules

Text Mining and the Future Exploration

A New Method of Training Sample Selection in Text Classification

Hybrid text mining model for document classification

A Term Weighting Approach with Subjective Logic Reasoning for Text Categorization

Large-Scale Hierarchical Text Classification Based on Path Semantic Vector and Prior Information

Combining Multiple Feature Selection Methods for Text Categorization by Using Rank-Score Characteristics

The Application Research of Topic Word List In Text Automatic Classification

The Optimization of Threshold-Based Naive Bayesian Algorithm

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options