Search results

Items from 1 to 20 out of 133 results

chapter

Autonomous website categorization with pre-defined dictionary

Adsadawut Chanakitkarnchok, Kulit Na Nakorn, Kultida Rojviboolchai

2016 13th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) > 1 - 6

2016 13th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

In this technology emerging era, the number of websites is increasing dramatically. The content and category of information are overflowing the Internet World. Finding the right information from almost a billion of websites is considerably hard, but finding the accurate and quality one is even harder. Hence, the need of website categorization's demand is increasing tremendously. Unfortunately, the...

chapter

A novel text classification based on Mahalanobis distance

Suli Zhang, Xin Pan

2011 3rd International Conference on Computer Research and Development > 3 > 156 - 158

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

In text mining field, The KNN (K Nearest Neighbors) is one of the oldest and simplest methods of text classification. But it is known to be sensitive to the distance (or similarity) function used in classifying a test instance, this disadvantage can cause low classification accuracy and limit the KNN classifier's utilization in text classification in text mining. In this paper, we introduce Mahalanobis...

chapter

A new approach for text feature selection based on OWA operator

M A Ghaderi, N Yazdani, B Moshiri, Maryam Tayefeh Mahmoudi

2010 5th International Symposium on Telecommunications > 579 - 583

2010 5th International Symposium on Telecommunications (IST)

Feature selection has a significant role in the precision of text classification algorithms. In this regard, various approaches exist such as information Gain, Chi Square, Document Frequency, Mutual Information, etc. To improve the classification effectiveness combination of some input features may help a lot. In this paper, a new approach based on Ordered-Weighted Averaging (OWA) is proposed for...

chapter

A New Search Engine Filtering Scheme Based on Improved Neural Network and Ontology

Zhuocong Song, Xiaopen Cheng

2010 International Conference on Computational and Information Sciences > 178 - 181

2010 International Conference on Computational and Information Sciences (ICCIS 2010)

Current search engines are not very effective in filtering out harmful information since the technology they use for filtering is often based on traditional text classification in which texts are often classified according to feature words. To improve the effectiveness of filtering, in this paper, we propose a new filtering scheme in which we combine the neural network and ontology categorization...

chapter

A new feature selection method based on distributional information for Text Classification

Nianyun Shi, Lingling Liu

2010 IEEE International Conference on Progress in Informatics and Computing > 1 > 190 - 194

2010 International Conference on Progress in Informatics and Computing (PIC 2010)

Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature's distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature's distribution...

chapter

A Method of Text Feature Extraction Based on Weighted Scatter Difference

Liu Haifeng, Su Zhan, Yao Zeqing, Zhang Xueren

2010 Second WRI Global Congress on Intelligent Systems > 3 > 83 - 86

2010 Second WRI Global Congress on Intelligent Systems (GCIS 2010)

Feature reduction is one of the core technologies of automatic text categorization. As for the scatter difference criterion, poor categorization effect is made when the between-class distance is small and the class density is high. In order to solve this problem, a weighted method based on the sample distribution is shown in the paper, which will make the between-class and within-class scatter matrixes...

chapter

Search-based short-text classification

Kang Wei, Ruiquan Zhang, Xinguo Xu

5th International Conference on Pervasive Computing and Applications > 297 - 301

2010 5th International Conference on Pervasive Computing and Applications (ICPCA 2010)

Since the traditional classification algorithm does not work well in the case of short-text classification, we propose a search-based method employing Na'iveBayes classification algorithm. This paper describes the whole process, including the classification algorithms, training and the evaluation. The results indicate that the classifier has better performance comparing with other methods.

chapter

Study on Key Technology for Topic Tracking

Shengdong Li, Xueqiang Lv, Hongwei Wang, Shuicai Shi

2010 Sixth International Conference on Semantics, Knowledge and Grids > 275 - 280

2010 Sixth International Conference on Semantics Knowledge and Grid (SKG 2010)

Text classification is the key technology for topic tracking, and vector space model (VSM) is one of the most simple and effective model for topics representation. On the basis of K-nearest neighbor (KNN) algorithm for text classification and support vector machines (SVM) algorithm for text classification, we have studied how they affect topic tracking. Then we get the variation law that they affect...

chapter

Improving Arabic document categorization: Introducing local stem

Eiman Tamah Al-Shammari

2010 10th International Conference on Intelligent Systems Design and Applications > 385 - 390

10th International Conference on Intelligent Systems Design and Applications (ISDA 2010)

Stemming is a fundamental step in processing textual data preceding the tasks of text mining, Information Retrieval (IR), and natural language processing (NLP). The common goal of stemming is to standardize words by reducing a word to its base (root or stem), thus can be also considered a feature reduction technique. This paper aims at presenting a new dictionary free, content-based Arabic stemmer...

chapter

Data Imbalance Problem in Text Classification

Yanling Li, Guoshe Sun, Yehang Zhu

2010 Third International Symposium on Information Processing > 301 - 305

2010 Third International Symposium on Information Processing (ISIP 2010)

Aimming at the ever-present problem of imbalanced data in text classification, the authors study on several forms of imbalanced data, such as text number, class size, subclass and class fold. Some useful conclusions are gotten from a series of correlative experiments: first, when the text of two class is almost the same number, the difference of word number become major factor to affect the accuracy...

chapter

Study on Multi-layer Fusion Classification Model of Multi-media Information

Xiao-Dan Zhang

2010 International Conference on Web Information Systems and Mining > 1 > 216 - 218

2010 International Conference on Web Information Systems and Mining (WISM 2010)

For higher text classification precision, a general fusion classification model and algorithm are proposed, which based on model theory of information fusion, adopting multi-Media information on the network. The model includes two layers, one is feature layer, which deals with different Media information with different classification algorithm, and inputs the classification results into the higher...

chapter

Using Text Categorization to Find Job Opportunities

Shilin Zhang, Mei Gu

2010 International Conference on Web Information Systems and Mining > 1 > 25 - 29

2010 International Conference on Web Information Systems and Mining (WISM 2010)

Text Classification is an important field of research. There are a number of approaches to classify text documents. However, there is an important challenge to improve the computational efficiency and recall. In this paper, we propose a novel framework to segment Chinese words, generate word vectors, train the corpus and make prediction. Based on the text classification technology, we successfully...

chapter

An Improved Algorithm to Term Weighting in Text Classification

Ran Li, Xianjiu Guo

2010 International Conference on Multimedia Technology > 1 - 3

2010 International Conference on Multimedia Technology (ICMT)

The traditional TF-IDF algorithm is a common method that is used to measure feature weight in text categorization. However, the algorithm doesn't take the distribution of feature terms in inter-class and intra-class into consideration. Consequently, the algorithm can't effectively weigh the distribution proportion of feature items. In order to solve this problem, information entropy in inter-class...

chapter

Improved Feature Selection Algorithm Based on Concentration and Dispersion

Shen You-Wen, Zhao Xin-Jian

2010 International Conference on Web Information Systems and Mining > 1 > 262 - 265

2010 International Conference on Web Information Systems and Mining (WISM 2010)

This paper analyzes the concentration and dispersion of the integrated feature selection algorithm (TFFS),and finds their shortcomings: it is difficult for concentration to measure the weigh of the low frequent terms; dispersion ignores the impact of term whose mutual information is negative. Propose a modified feature selection algorithm (TFFSL), which makes certain improvements on concentration...

chapter

A k-nearest neighbor text classification algorithm based on fuzzy integral

Xianfei Zhang, Bicheng Li, Xianzhu Sun

2010 Sixth International Conference on Natural Computation > 5 > 2228 - 2231

2010 Sixth International Conference on Natural Computation (ICNC)

This paper presents a k -nearest neighbor text classification algorithm based on fuzzy integral. It regards the k nearest training samples as k evidences, and fuses it using fuzzy integral, which avoids independence demand of D-S theory and improves performance of text classification. Experiment compares the new method with improved kNN algorithms and other text classification algorithms, which result...

chapter

An empirical evaluation of linear and nonlinear kernels for text classification using Support Vector Machines

Ya Gao, Shiliang Sun

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1502 - 1505

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

This paper compares the performance of linear and nonlinear kernels of Support Vector Machines (SVM) used for text classification. The study is motivated by the previous viewpoint that linear SVM performs better than nonlinear one, and that, although there are many investigations have proved that SVM performs well in text classification, there is no serious investigation on the comparison between...

chapter

A Semi-supervised Text Classification Method Based on Incremental EM Algorithm

Xinghua Fan, Zhiyi Guo

2010 WASE International Conference on Information Engineering > 2 > 211 - 214

2010 WASE International Conference on Information Engineering (ICIE 2010)

In the standard EM-based semi-supervised text classification, the classification performance is not well when the initial labeled samples are a few. How to improve the performance is an important issue. In view of this, a semi-supervised method based on incremental EM algorithm is proposed. This method makes full use of the useful information of intermediate classifier. On the one hand, this method...

chapter

Study on Key Technology of Topic Tracking Based on SVM

Shengdong Li, Xueqiang Lv, Yuqin Li, Shuicai Shi

2010 WASE International Conference on Information Engineering > 2 > 11 - 14

2010 WASE International Conference on Information Engineering (ICIE 2010)

Text classification is the key technology for topic tracking, and vector space model (VSM) is one of the most simple and effective model for topics representation. On the basis of VSM and support vector machines (SVM), we have studied how feature space dimension in VSM as well as linearly separable and non-separable SVM affect topic tracking. Then we get the variation law that they affect topic tracking,...

chapter

Semi-automated feature selection for web text filtering

Ying Chen, Ou Wu

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 6 > 2513 - 2517

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

The explosive growth of the Internet inevitably leads to the proliferation of harmful information such as pornography, drug and violence. A great deal of filtering techniques based on image and text categorization is proposed in the literature. Among them, text-based filtering plays a leading role for its good performance. Existing text filtering algorithms can be seen as a classical text categorization...

chapter

Improvement and Application of TF•IDF Method Based on Text Classification

Qiaoyan Kuang, Xiaoming Xu

2010 International Conference on Internet Technology and Applications > 1 - 4

2010 International Conference on Internet Technology and Applications (iTAP 2010)

Feature extraction is the important prerequisite of classifying text effectively and automatically. TF·IDF is widely used to express the text feature weight. But it has some problems. TF·IDF can't reflect the distribution of terms in the text, and then can't reflect the importance degree and the difference between categories. This paper proposes a new feature weighting method-TF·IDF·C_i to which a...

Keywords:
TEXT CATEGORIZATION
TEXT ANALYSIS

Publication date

Set your own date range

Content availability

Available (127)
None (6)

Publication type

book (127)
article (6)

Keywords

CLASSIFICATION ALGORITHMS (85)
TRAINING (67)
PATTERN CLASSIFICATION (61)
FEATURE EXTRACTION (45)
SUPPORT VECTOR MACHINES (43)
CLASSIFICATION (42)
ACCURACY (38)
SUPPORT VECTOR MACHINE CLASSIFICATION (35)
DATA MINING (29)
FEATURE SELECTION (25)
MACHINE LEARNING (25)
LEARNING (ARTIFICIAL INTELLIGENCE) (24)
NATURAL LANGUAGE PROCESSING (21)
SUPPORT VECTOR MACHINE (21)
SVM (17)
ALGORITHM DESIGN AND ANALYSIS (16)
BAYES METHODS (16)
INFORMATION RETRIEVAL (15)
INTERNET (11)
VECTOR SPACE MODEL (11)
COMPUTERS (10)
KERNEL (10)
ARTIFICIAL NEURAL NETWORKS (9)
TEXT MINING (9)
COMPUTATIONAL MODELING (8)
MUTUAL INFORMATION (8)
FILTERING (7)
KNN (7)
SEMANTICS (7)
BAYESIAN METHODS (6)
INFORMATION FILTERING (6)
ONTOLOGIES (6)
PATTERN CLUSTERING (6)
STATISTICAL ANALYSIS (6)
WEB PAGES (6)
BIOLOGICAL SYSTEM MODELING (5)
CLUSTERING ALGORITHMS (5)
DICTIONARIES (5)
DIMENSION REDUCTION (5)
FEATURE SELECTION METHOD (5)
INDEXING (5)
MACHINE LEARNING ALGORITHMS (5)
NAIVE BAYES CLASSIFIER (5)
NATURAL LANGUAGES (5)
NIOBIUM (5)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (5)
OPTIMIZATION (5)
PROBABILITY (5)
PROTOTYPES (5)
TEXT CLASSIFICATION ALGORITHM (5)
TEXT REPRESENTATION (5)
VECTORS (5)
ARTIFICIAL INTELLIGENCE (4)
AUTOMATIC TEXT CLASSIFICATION (4)
CONFERENCES (4)
CORRELATION (4)
DATA MODELS (4)
DATABASES (4)
ELECTRONIC MAIL (4)
ENTROPY (4)
FEATURE SELECTION ALGORITHM (4)
INDEXES (4)
INFORMATION EXTRACTION (4)
INFORMATION MANAGEMENT (4)
LEARNING SYSTEMS (4)
OPTIMISATION (4)
ROUGH SET THEORY (4)
SUPERVISED LEARNING (4)
TDT EVALUATION (4)
TFIDF (4)
TOPIC TRACKING (4)
WEB SITES (4)
WORD PROCESSING (4)
ANALYTICAL MODELS (3)
CLASSIFICATION ALGORITHM (3)
CLASSIFICATION TREE ANALYSIS (3)
COMPUTER SCIENCE (3)
COVARIANCE MATRIX (3)
DIMENSIONALITY REDUCTION (3)
FEATURE WEIGHT (3)
HEURISTIC ALGORITHMS (3)
HIDDEN MARKOV MODELS (3)
INFORMATION PROCESSING (3)
INVERSE DOCUMENT FREQUENCY (3)
NOISE (3)
ONTOLOGY (3)
PRESSES (3)
PROBABILISTIC LOGIC (3)
SEMI-SUPERVISED LEARNING (3)
SPAM FILTERING (3)
TAGGING (3)
TERM WEIGHTING (3)
TEXT PROCESSING (3)
TEXT RETRIEVAL (3)
TRAINING DATA (3)
TREES (MATHEMATICS) (3)
UNSOLICITED E-MAIL (3)
more

INFONA - science communication portal

Search results

Autonomous website categorization with pre-defined dictionary

A novel text classification based on Mahalanobis distance

A new approach for text feature selection based on OWA operator

A New Search Engine Filtering Scheme Based on Improved Neural Network and Ontology

A new feature selection method based on distributional information for Text Classification

A Method of Text Feature Extraction Based on Weighted Scatter Difference

Search-based short-text classification

Study on Key Technology for Topic Tracking

Improving Arabic document categorization: Introducing local stem

Data Imbalance Problem in Text Classification

Study on Multi-layer Fusion Classification Model of Multi-media Information

Using Text Categorization to Find Job Opportunities

An Improved Algorithm to Term Weighting in Text Classification

Improved Feature Selection Algorithm Based on Concentration and Dispersion

A k-nearest neighbor text classification algorithm based on fuzzy integral

An empirical evaluation of linear and nonlinear kernels for text classification using Support Vector Machines

A Semi-supervised Text Classification Method Based on Incremental EM Algorithm

Study on Key Technology of Topic Tracking Based on SVM

Semi-automated feature selection for web text filtering

Improvement and Application of TF•IDF Method Based on Text Classification

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options