Advanced search

Advanced search in people

From:

To:

Items from 1 to 20 out of 51 results

chapter

A hybrid algorithm for text classification based on rough set

Weibin Deng

2011 3rd International Conference on Computer Research and Development > 1 > 406 - 410

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

Nowadays, text classification has been one of the key subjects in intelligent information processing. Owing to the complex features of natural language, the feature space dimensions will be particularly high. How to improve the accuracy of text classification is an important and hard problem. As rough set is a useful tool to deal with uncertain information, a hybrid algorithm for text classification...

chapter

A novel text classification based on Mahalanobis distance

Suli Zhang, Xin Pan

2011 3rd International Conference on Computer Research and Development > 3 > 156 - 158

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

In text mining field, The KNN (K Nearest Neighbors) is one of the oldest and simplest methods of text classification. But it is known to be sensitive to the distance (or similarity) function used in classifying a test instance, this disadvantage can cause low classification accuracy and limit the KNN classifier's utilization in text classification in text mining. In this paper, we introduce Mahalanobis...

chapter

Classification of brand names based on n-grams

P Warintarawej, A Laurent, P Pompidor, B Laurent

2010 International Conference of Soft Computing and Pattern Recognition > 12 - 17

2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR 2010)

Supervised classification has been extensively addressed in the literature as it has many applications, especially for text categorization or web content mining where data are organized through a hierarchy. On the other hand, the automatic analysis of brand names can be viewed as a special case of text management, although such names are very different from classical data. They are indeed often neologisms,...

chapter

Improving Arabic document categorization: Introducing local stem

Eiman Tamah Al-Shammari

2010 10th International Conference on Intelligent Systems Design and Applications > 385 - 390

10th International Conference on Intelligent Systems Design and Applications (ISDA 2010)

Stemming is a fundamental step in processing textual data preceding the tasks of text mining, Information Retrieval (IR), and natural language processing (NLP). The common goal of stemming is to standardize words by reducing a word to its base (root or stem), thus can be also considered a feature reduction technique. This paper aims at presenting a new dictionary free, content-based Arabic stemmer...

chapter

Data Imbalance Problem in Text Classification

Yanling Li, Guoshe Sun, Yehang Zhu

2010 Third International Symposium on Information Processing > 301 - 305

2010 Third International Symposium on Information Processing (ISIP 2010)

Aimming at the ever-present problem of imbalanced data in text classification, the authors study on several forms of imbalanced data, such as text number, class size, subclass and class fold. Some useful conclusions are gotten from a series of correlative experiments: first, when the text of two class is almost the same number, the difference of word number become major factor to affect the accuracy...

chapter

Improved Feature Selection Algorithm Based on Concentration and Dispersion

Shen You-Wen, Zhao Xin-Jian

2010 International Conference on Web Information Systems and Mining > 1 > 262 - 265

2010 International Conference on Web Information Systems and Mining (WISM 2010)

This paper analyzes the concentration and dispersion of the integrated feature selection algorithm (TFFS),and finds their shortcomings: it is difficult for concentration to measure the weigh of the low frequent terms; dispersion ignores the impact of term whose mutual information is negative. Propose a modified feature selection algorithm (TFFSL), which makes certain improvements on concentration...

chapter

English and Taiwanese text categorization using N-gram based on Vector Space Model

M Suzuki, N Yamagishi, Yi-Ching Tsai, T Ishida, more

2010 International Symposium On Information Theory&Its Applications > 106 - 111

2010 International Symposium On Information Theory & Its Applications (ISITA 2010)

In this paper, we present a new mathematical model based on a “Vector Space Model” and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic...

chapter

Co-training based algorithm for datasets without the natural feature split

J Slivka, A Kovačević, Z Konjović

IEEE 8th International Symposium on Intelligent Systems and Informatics > 279 - 284

2010 IEEE 8th International Symposium on Intelligent Systems and Informatics (SISY 2010)

The performance of a classification model depends not only on the algorithm by which the model is learned, but also on the training set. Manual annotation of the training data is a tedious and time consuming job. In order to overcome the problem of laborious hand-labeling of a large training set, a set of techniques called semi-supervised learning was designed. Co-training is one of the major semi-supervised...

chapter

Stock price prediction using financial news articles

M I Y Kaya, M E Karsligil

2010 2nd IEEE International Conference on Information and Financial Engineering > 478 - 482

2010 2nd IEEE International Conference on Information and Financial Engineering (ICIFE 2010)

Stock price prediction is one of the most important issues to be investigated in academic and financial researches. Data mining techniques are frequently involved in the studies aimed to achieve this problem. In this paper we investigate predicting stock prices using financial news articles. A prediction model, finding and analyzing correlation between contents of news articles and stock prices and...

chapter

Random Subspace Method in Text Categorization

Mehrdad J Gangeh, Mohamed S Kamel, Robert P W Duin

2010 20th International Conference on Pattern Recognition > 2049 - 2052

2010 20th International Conference on Pattern Recognition (ICPR 2010)

In text categorization (TC), which is a supervised technique, a feature vector of terms or phrases is usually used to represent the documents. Due to the huge number of terms in even a moderate-size text corpus, high dimensional feature space is an intrinsic problem in TC. Random subspace method (RSM), a technique that divides the feature space to smaller ones each submitted to a (base) classifier...

chapter

A new feature weighting method based on probability distribution in imbalanced text classification

Leilei Chu, Hui Gao, Wenbo Chang

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2335 - 2339

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Many real-world text classification tasks involve imbalanced training examples. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We propose a new approach using a probability distribution to assign the feature weight and apply it to Naive Bayes classifier. The method is evaluated in our experiments on FuDan Chinese Corpus. The experimental...

chapter

An empirical evaluation of linear and nonlinear kernels for text classification using Support Vector Machines

Ya Gao, Shiliang Sun

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1502 - 1505

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

This paper compares the performance of linear and nonlinear kernels of Support Vector Machines (SVM) used for text classification. The study is motivated by the previous viewpoint that linear SVM performs better than nonlinear one, and that, although there are many investigations have proved that SVM performs well in text classification, there is no serious investigation on the comparison between...

chapter

An event ontology construction approach to web crime mining

Li Cunhua, Hu Yun, Zhong Zhaoman

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2441 - 2445

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Along with the rapid popularity of the Internet, crime information on the web is becoming increasingly rampant, and the majority of them are in the form of text. Because a lot of crime information in documents is described through events, event-based semantic technology can be used to study the patterns and trends of web-oriented crimes. In our research project on cyber crime mining, we construct...

chapter

A two-stage feature selection method for text categorization

Jiana Meng, Hongfei Lin

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1492 - 1496

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Feature selection for text classification is a well-studied problem and the goals are improving classification effectiveness, computational efficiency, or both. In this paper, we propose a two-stage feature selection algorithm based on a kind of feature selection method and latent semantic indexing. Traditional word-matching based text categorization system uses vector space model to represent the...

chapter

Evolved Apache Lucene SpanFirst queries are good text classifiers

L Hirsch

IEEE Congress on Evolutionary Computation > 1 - 8

2010 IEEE Congress on Evolutionary Computation

Human readable text classifiers have a number of advantages over classifiers based on complex and opaque mathematical models. For some time now search queries or rules have been used for classification purposes, either constructed manually or automatically. We have performed experiments using genetic algorithms to evolve text classifiers in search query format with the combined objective of classifier...

chapter

On the Design of Learning Objects Classifiers

Marcelo Mendoza, Carlos Becerra

2010 10th IEEE International Conference on Advanced Learning Technologies > 464 - 468

2010 IEEE 10th International Conference on Advanced Learning Technologies (ICALT 2010)

An important limitation of learning object repositories is that they frequently provide incomplete or imperfect information to describe the resources that they index. A form of dealing with this limitation is to categorize the learning objects in a taxonomy that allows main themes to be identified that cover each of these resources. In this paper, we will explore two techniques to categorize learning...

chapter

Text Classification Based on Ant Colony Optimization

Lijuan Jiao, Liping Feng

2010 Third International Conference on Information and Computing > 3 > 229 - 232

Third International Conference on Information and Computing Science (ICIC 2010)

A new text classification algorithm which is based on Ant Colony Algorithm is proposed in this paper. It makes use of the advantage in solving discrete problems by ACO and discreteness of text documents' features. Texts are classified by crawling of class population ants which have class information with them to find an optimal path matching it during iteration in the algorithm. It can get a satisfactory...

chapter

Computer Assisted Assessment (CAA) of Free-Text: Literature Review and the Specification of an Alternative CAA System

N N Karanikolas

2010 19th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises > 116 - 118

2010 19th IEEE International Workshop On Enabling Technologies: Infrastructures For Collaborative Enterprises (WETICE)

So far there are a number of computer assisted assessment approaches that are based on a variety of features. However, those approaches exploit the whole set of training documents in order to assess a provided free-text answer against a given question. Recent text classification approaches are orientated to mine average class documents and consequently they provide cheap classification methods that...

chapter

Use semantic meaning of coreference to improve classification text representation

Ziqiang Li, Mingtian Zhou

2010 2nd IEEE International Conference on Information Management and Engineering > 416 - 420

2010 2nd IEEE International Conference on Information Management and Engineering (ICIME 2010)

On large scale dataset, the effect of automatic text classification is now still far from perfect. It's a common agreement that more sufficient text semantic meaning be adopted in text representation to deal with the challenge. This paper introduces semantic meaning of coreference in and to improve traditional BOW representation. The result of text classification experiment shows that, contrasted...

chapter

Text categorization algorithms representations based on inductive learning

Cao Jian-fang, Wang Hong-bin

2010 2nd IEEE International Conference on Information Management and Engineering > 352 - 355

2010 2nd IEEE International Conference on Information Management and Engineering (ICIME 2010)

Text categorization-assignment of natural language texts to one or more predefined categories based on their content-is an important component in many information organization and management tasks. Categorization algorithm is the most critical factor to text categorization system performance. The inductive learning classifiers are put forward. Very accurate text categorization result can be learned...

Keywords:
ACCURACY
PATTERN CLASSIFICATION
TEXT CATEGORIZATION

Publication date

Set your own date range

INFONA - science communication portal

Advanced search

Advanced search in people

A hybrid algorithm for text classification based on rough set

A novel text classification based on Mahalanobis distance

Classification of brand names based on n-grams

Improving Arabic document categorization: Introducing local stem

Data Imbalance Problem in Text Classification

Improved Feature Selection Algorithm Based on Concentration and Dispersion

English and Taiwanese text categorization using N-gram based on Vector Space Model

Co-training based algorithm for datasets without the natural feature split

Stock price prediction using financial news articles

Random Subspace Method in Text Categorization

A new feature weighting method based on probability distribution in imbalanced text classification

An empirical evaluation of linear and nonlinear kernels for text classification using Support Vector Machines

An event ontology construction approach to web crime mining

A two-stage feature selection method for text categorization

Evolved Apache Lucene SpanFirst queries are good text classifiers

On the Design of Learning Objects Classifiers

Text Classification Based on Ant Colony Optimization

Computer Assisted Assessment (CAA) of Free-Text: Literature Review and the Specification of an Alternative CAA System

Use semantic meaning of coreference to improve classification text representation

Text categorization algorithms representations based on inductive learning

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options