Search results

Items from 1 to 20 out of 30 results

chapter

Information retrieval: A new multilingual stemmer based on a statistical approach

Said Gadri, Abdelouahab Moussaoui

2015 3rd International Conference on Control, Engineering & Information Technology (CEIT) > 1 - 6

2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)

Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization...

chapter

Term-frequency Based Feature Selection Methods for Text Categorization

Yan Xu, Lin Chen

2010 Fourth International Conference on Genetic and Evolutionary Computing > 280 - 283

2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010)

A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but they do not use term frequency...

chapter

Research and Implement of Chinese Text Classifier Based on Naïve Bayes Method

Jian Huang, Zhongdi Cen, Qiuhong Zheng

2010 Sixth International Conference on Semantics, Knowledge and Grids > 426 - 428

2010 Sixth International Conference on Semantics Knowledge and Grid (SKG 2010)

Naïve Bayes classifier is proved to be one of the most effective classifier an be used widely. It applies statistical theory to text classification. This paper researched and implemented a Chinese text classifier using JAVA base on Naïve Bayes Method. First of all, this paper described test classification system, the content includes text information expressing, extracting and the method of Chinese...

chapter

The research of the feature selection method based on the ECE and quantum genetic algorithm

Zhang Wei, Qiu Ye

2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE) > 6 > V6-193 - V6-196

2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE 2010)

Feature selection method is the critical technique of the automatic text categorization. A new method of the text feature selection based on the quantum genetic algorithm is proposed in this paper. First of all, using the ECE statistical method to remove redundant features and noise features for the original feature set, Genetic algorithms are used to optimal feature subset; finally the best feature...

chapter

Chinese text categorization study based on CBM learning

Yan Zhan, Hao Chen

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1511 - 1514

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Text Categorization (TC) is an important component in many information organization and information management tasks. In many TC applications, the case-base grows at a fast rate and this causes inefficiency in the case retrieval process. Using Case-Base Maintenance learning via the GC (Generalization Capability) algorithm, which can reduce the case number into KNN algorithm, can improve efficiency...

chapter

Text Categorization Research Based on Cluster Idea

Jialun Lin, Xiaoling Li, Yuan Jiao

2010 Second International Workshop on Education Technology and Computer Science > 1 > 483 - 486

2010 2nd International Workshop on Education Technology and Computer Science (ETCS)

Classification and clustering are frequently-used methods in data excavation technology. This paper introduces the idea of text clustering into the categorization algorithm study. The authors also attempt to use the text categorization pattern of self'-initiated learning to design a clustering-based text categorization algorithm, in the purpose of reducing the dimension of training set and raising...

chapter

A MultiExpert Approach for Bayesian Network Structural Learning

F. Colace, M. De Santo, M. Vento

2010 43rd Hawaii International Conference on System Sciences > 1 - 11

2010 43rd Hawaii International Conference on System Sciences (HICSS-43)

The determination of a Bayesian network structure, especially in the case of wide domains, can be often complex, time consuming and imprecise. Therefore the interest of scientific community in learning Bayesian network structure from data is increasing: many techniques or disciplines, as data mining, text categorization, ontology building, can take advantage from structural learning. In literature...

chapter

Research on Text Classification Algorithm by Combining Statistical and Ontology Methods

Guoshi Wu, Kaiping Liu

2009 International Conference on Computational Intelligence and Software Engineering > 1 - 4

2009 International Conference on Computational Intelligence and Software Engineering

Traditional statistics based text classification methods almost construct their characteristic vectors with some key terms, and they consider terms are independent of each other and there are no semantic relations among them. However, in the real world, words used to have semantic relationships, such as synonym, hyponymy and so on. Therefore, classification methods based on statistics do not conform...

chapter

Multi-instance learning with relational information of instances

G. Herman, Getian Ye, Yang Wang, Jie Xu, more

2009 Workshop on Applications of Computer Vision (WACV) > 1 - 7

2009 Workshop on Applications of Computer Vision (WACV 2009)

Multi-instance learning (MIL) has many applications, including image and text categorization. One of the most effective approaches to MIL is by using support vector machines with multi-instance kernels. In this paper we propose a multi-instance kernel, called MIR-kernel, that takes into account the relational information of instances when computing similarities between bags. The relational information...

chapter

Naïve Bayes text classification with positive features selected by statistical method

M.J. Meena, K.R. Chandran

2009 First International Conference on Advanced Computing > 28 - 33

2009 First International Conference on Advanced Computing (ICAC 2009)

Text classification is enduring to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Naive Bayes is an effective and a simple classifier for data mining tasks, but does not show much satisfactory results in automatic text classification problems. In this paper, the performance of naive Bayes classifier is analyzed by training the...

chapter

An Improved X² (CHI) Statistics Method for Text Feature Selection

Tang Yan, Xiao Ting

2009 International Conference on Computational Intelligence and Software Engineering > 1 - 4

2009 International Conference on Computational Intelligence and Software Engineering

Feature selection is a hot topic in current search field, especially in the field of text categorization. To overcome the shortcomings of traditional χ² (CHI) approach, an improved χ² (CHI) statistics method is proposed in this paper. It comprehensively takes criterions such as Document Frequency and Class Accuracy of the traditional statistical methods to improve χ² (CHI) statistical method. The...

chapter

Increasing the Accuracy of Discriminative of Multinomial Bayesian Classifier in Text Classification

T. Mouratis, S. Kotsiantis

2009 Fourth International Conference on Computer Sciences and Convergence Information Technology > 1246 - 1251

2009 Fourth International Conference on Computer Sciences and Convergence Information Technology

Text classification plays an important role in information extraction and summarization, text retrieval, and question-answering. The discriminative multinomial naive Bayes classifier has been a focus of research in the field of text classification. This paper increases the accuracy of discriminative multinomial Bayesian classifier with the usage of the feature selection technique that evaluates the...

chapter

Classifying Text with Statistically Selected Features to Closely Related Categories

M. Janaki Meena, K.R. Chandran

2009 International Conference on Advances in Recent Technologies in Communication and Computing > 297 - 301

2009 International Conference on Advances in Recent Technologies in Communication and Computing. ARTCom 2009

Text classification is continuing to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Classifying documents to closely related categories is the most complex task in text categorization. Feature selection is an essential preprocessing step for improving the efficiency and accuracy of the text classifiers by removing redundant and...

chapter

Multi-class bootstrapping learning aspect-related terms for aspect identification

Chunliang Zhang, Jingbo Zhu

2009 International Conference on Natural Language Processing and Knowledge Engineering > 1 - 6

2009 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE)

Aspect identification in entity reviews involving multiple aspects is a top priority for aspect-based opinion mining. Most of previous studies adopted machine learning techniques taking it as a multi-class text classification task. However, since building labeled training data is often expensive, some researchers put more interest in unsupervised techniques. With the subject of online restaurant reviews,...

chapter

Classifying non-gaussian and mixed data sets in their natural parameter space

C. Levasseur, U.F. Mayer, K. Kreutz-Delgado

2009 IEEE International Workshop on Machine Learning for Signal Processing > 1 - 6

2009 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2009)

We consider the problem of both supervised and unsupervised classification for multidimensional data that are non-Gaussian and of mixed types (continuous and/or discrete). An important subclass of graphical model techniques called generalized linear statistics (GLS) is used to capture the underlying statistical structure of these complex data. GLS exploits the properties of exponential family distributions,...

chapter

A Text Classification Method with an Effective Feature Extraction Based on Category Analysis

Yun Li, Yan Sheng, Luan Luan, Ling Chen

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery > 1 > 95 - 99

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009)

Text classification refers to determine the class of an unknown text according to its content in the given classification system. In order to extract fewer features to express the information in the text as much as possible, the paper analysis the various features' statistical properties and to extract the global features according to Zipf's law; and then, based on the statistical analysis of the...

chapter

Automatic Genre Classification by Using Co-training

Rui Liu, Minghu Jiang, Zheng Tie

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery > 1 > 129 - 132

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009)

Researchers have concentrated on topic-based text classification while the genre of a document is rarely considered. In this article, we discuss the automatic genre classification and its application. We argue that word level features and sentence level features are two important measures which vary in number among different genres. Word level features include word frequency and POS (part of speech)...

chapter

Studies of Comprehensive Auto-Indexing System Based on Key Words' Subject Degree

Liu Hua

2009 International Forum on Information Technology and Applications > 3 > 334 - 336

2009 International Forum on Information Technology and Applications (IFITA)

Key words are expressions that indicate and express the subject concept of a text, the major property of key words is to denote subject. Based on the domainal inhomogeneity and critical region of key words, subject degree is brought up and calculated by statistical model to cue textpsila subject concept. Based on key words and itspsila subject degree, constructed a comprehensive auto-indexing system,...

chapter

Document classification efficiency of phrase-based techniques

N. Kapalavayi, S.N.J. Murthy, Gongzhu Hu

2009 IEEE/ACS International Conference on Computer Systems and Applications > 174 - 178

2009 7th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA-2009)

Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended...

chapter

Fine Text Categorization: Using Very Aggressive Feature Selection to Cope with Mass Duplicated Features

Liuling Dai, Jinwu Hu, ShiKun Wu

2008 International Conference on Intelligent Computation Technology and Automation (ICICTA) > 2 > 984 - 988

2008 International Conference on Intelligent Computation Technology and Automation (ICICTA)

Text categorization is a key issue of text mining. Although there are many studies on this problem, the majority of them are focused on classification of rough categories. In this kind of problem, there are obviously different features that can differentiate one category from others. Only very few researches concerned fine text categorization (FTC) problem which is characterized by many duplicated...

Keywords:
STATISTICAL ANALYSIS

Publication date

Set your own date range

Publication type

book (29)
article (1)

Keywords

TEXT ANALYSIS (26)
CLASSIFICATION ALGORITHMS (18)
TRAINING (13)
ACCURACY (11)
PATTERN CLASSIFICATION (11)
FEATURE EXTRACTION (10)
FEATURE SELECTION (10)
SUPPORT VECTOR MACHINES (10)
DATA MINING (8)
BAYES METHODS (7)
ALGORITHM DESIGN AND ANALYSIS (6)
MACHINE LEARNING (6)
TEXT CLASSIFICATION (6)
TEXT MINING (6)
LEARNING (ARTIFICIAL INTELLIGENCE) (5)
SUPPORT VECTOR MACHINE CLASSIFICATION (5)
TRAINING DATA (5)
CLASSIFICATION (4)
STATISTICAL METHOD (4)
COMPUTERS (3)
DATABASES (3)
ENTROPY (3)
INFORMATION RETRIEVAL (3)
NAIVE BAYES CLASSIFIER (3)
PATTERN CLUSTERING (3)
SUPPORT VECTOR MACHINE (3)
TESTING (3)
ARTIFICIAL NEURAL NETWORKS (2)
BAG-OF-WORDS (2)
BAYESIAN METHODS (2)
BELIEF NETWORKS (2)
CHI-SQUARE STATISTICS (2)
CHIR (2)
CLUSTERING ALGORITHMS (2)
DICTIONARIES (2)
DIGITAL DATA (2)
DOCUMENT CLASSIFICATION (2)
DOMAIN KNOWLEDGE RELATIONS (2)
FREQUENCY DOMAIN ANALYSIS (2)
GAIN (2)
INTERNET (2)
K-NEAREST NEIGHBOR (2)
KNN ALGORITHM (2)
KNOWLEDGE ENGINEERING (2)
MUTUAL INFORMATION (2)
NATURAL LANGUAGE PROCESSING (2)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (2)
STATISTICAL METHODS (2)
TELECOMMUNICATIONS (2)
TEXT CLUSTERING (2)
TIME FREQUENCY ANALYSIS (2)
VECTORS (2)
WORD PROCESSING (2)
² CHI STATISTICS METHOD (1)
AEROSPACE ELECTRONICS (1)
AGGRESSIVE FEATURE SELECTION (1)
AMBIGUITY MEASURE (1)
ARRAYS (1)
ART (1)
ARTIFICIAL INTELLIGENCE (1)
ASPECT IDENTIFICATION (1)
ASPECT-RELATED TERMS (1)
AUTO-INDEXING SYSTEM (1)
AUTOMATIC GENRE CLASSIFICATION (1)
AUTOMATIC TEXT CATEGORIZATION (1)
AUTOMATIC TEXT CLASSIFICATION (1)
AVERAGE DOCUMENT MAPPING MODEL (1)
BAG OF WORDS (1)
BAYESIAN NETWORK STRUCTURAL LEARNING (1)
BAYESIAN THEORY (1)
BIGRAMS TECHNIQUE (1)
BISMUTH (1)
BOOTSTRAPPING (1)
BUILDINGS (1)
CANDIDATE SHIP (1)
CASE BASE MAINTENANCE LEARNING (1)
CASE-BASED REASONING (1)
CATEGORICAL DATA TEXT CATEGORIZATION (1)
CATEGORY ANALYSIS (1)
CATEGORY FREQUENCY (1)
CBM (1)
CBM LEARNING (1)
CHARACTER-LEVEL FREQUENT PATTERN EXTRACTION (1)
CHARACTER-LEVEL STATISTICAL METHOD (1)
CHI-SQUARE MAX METHOD (1)
CHI-SQUARED STATISTIC (1)
CHI2 STATISTIC (1)
CHINESE TEXT CATEGORIZATION (1)
CHINESE TEXT CLASSIFIER (1)
CHROMOSOME (1)
CLASS ACCURACY CRITERION (1)
CLASSICAL STATISTICAL TECHNIQUES (1)
CLASSIFICATION TREE ANALYSIS (1)
CLUSTER IDEA (1)
CLUSTERING-BASED TEXT CATEGORIZATION ALGORITHM (1)
CO-TRAINING (1)
COGNITIVE ABILITY (1)
COMPUTATIONAL LINGUISTICS (1)
more

INFONA - science communication portal

Search results

Information retrieval: A new multilingual stemmer based on a statistical approach

Term-frequency Based Feature Selection Methods for Text Categorization

Research and Implement of Chinese Text Classifier Based on Naïve Bayes Method

The research of the feature selection method based on the ECE and quantum genetic algorithm

Chinese text categorization study based on CBM learning

Text Categorization Research Based on Cluster Idea

A MultiExpert Approach for Bayesian Network Structural Learning

Research on Text Classification Algorithm by Combining Statistical and Ontology Methods

Multi-instance learning with relational information of instances

Naïve Bayes text classification with positive features selected by statistical method

An Improved X² (CHI) Statistics Method for Text Feature Selection

Increasing the Accuracy of Discriminative of Multinomial Bayesian Classifier in Text Classification

Classifying Text with Statistically Selected Features to Closely Related Categories

Multi-class bootstrapping learning aspect-related terms for aspect identification

Classifying non-gaussian and mixed data sets in their natural parameter space

A Text Classification Method with an Effective Feature Extraction Based on Category Analysis

Automatic Genre Classification by Using Co-training

Studies of Comprehensive Auto-Indexing System Based on Key Words' Subject Degree

Document classification efficiency of phrase-based techniques

Fine Text Categorization: Using Very Aggressive Feature Selection to Cope with Mass Duplicated Features

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options