Search results

Items from 1 to 20 out of 24 results

chapter

Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification

Tinh Thanh Dao, Tinh Dao Thanh, Thanh Nguyen Hai, Vinh Ho Ngoc

2015 Fifth International Conference on Communication Systems and Network Technologies > 1284 - 1288

2015 Fifth International Conference on Communication Systems and Network Technologies (CSNT)

In the languages, the occur of words are indicated about meaning of contents in text. Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing. In this paper, we proposed a novel method for building Vietnamese...

chapter

Job Opportunity Mining by Text Categorization

Shilin Zhang, Mei Gu

2010 2nd International Conference on Information Engineering and Computer Science > 1 - 4

2010 2nd International Conference on Information Engineering and Computer Science (ICIECS)

Text Classification is an important field of research. There are a number of approaches to classify text documents. However, there is an important challenge to improve the computational efficiency and recall. In this paper, we propose a novel framework to segment Chinese words, generate word vectors, train the corpus and make prediction. Based on the text classification technology, we successfully...

chapter

Chinese Web Text Classification System Model Based on Naive Bayes

Gong Zheng, Yu Tian

2010 International Conference on E-Product E-Service and E-Entertainment > 1 - 4

2010 International Conference on E-Product E-Service and E-Entertainment (ICEEE 2010)

Web text classification is the process of determine the text types automatically under a given classification, according to the text content. Web text categorization system is the use of machine learning, knowledge engineering and other related fields of knowledge, access to the web on the text, after text preprocessing, Chinese word segmentation and training classifier, using classification algorithm...

chapter

Determination of Bloom's cognitive level of question items using artificial neural network

Norazah Yusof, Chai Jing Hui

2010 10th International Conference on Intelligent Systems Design and Applications > 866 - 870

10th International Conference on Intelligent Systems Design and Applications (ISDA 2010)

We propose a classification model for the cognitive level of question items in examinations based on Bloom's taxonomy. The model implements the artificial neural network approach, which is trained using the scaled conjugate gradient learning algorithm. Several data preprocessing techniques such as word extraction, stop word removal, stemming, and vector representation are applied to a feature set...

chapter

LJParser: LING-JOIN web search & text mining development platform

Yiwei Wang

2010 4th International Universal Communication Symposium > 407

2010 4th International Universal Communication Symposium (IUCS 2010)

LJParser is a developing platform for web search and mining. It is a middleware by LING-JOIN Software, which is well known for over ten years of expertise in natural language understanding and web search. LJParser provides powerful modules including precise search for multiple language, new words detection, Chinese word segmentation and pas tagging, language modeling and term translation, text clustering,...

chapter

Text2arff: Automatic feature extraction software for Turkish texts

M F Amasyali, F Davletov, A I Torayew, Ümit Çiftçi

2010 IEEE 18th Signal Processing and Communications Applications Conference > 629 - 632

2010 IEEE 18th Signal Processing and Communications Applications Conference (SIU 2010)

Which features are the most important for the text classification tasks? In the automatic text categorization area, several studies seek answers to this question. In this paper, a feature extraction tool for Turkish texts (Text2arff) is presented. The toolbox automatically extracts several features such as the frequencies of the words and ngrams, word clustering, Latent semantic indexing etc. The...

chapter

Applying latent semantic analysis to classify emotions in Thai text

P Inrak, S Sinthupinyo

2010 2nd International Conference on Computer Engineering and Technology > 6 > V6-450 - V6-454

2010 2nd International Conference on Computer Engineering and Technology (ICCET)

With a rapid growth of the internet communication, many types of text are produced. They can convey the meanings that can contribute to text categorization. Emotion classification also becomes more interesting, but emotion classification in Thai text is still not able to be correctly classified. Thus, this paper proposes a novel approach that takes advantage of bi-words occurrence to classify emotion...

chapter

Text Categorization Research Based on Cluster Idea

Jialun Lin, Xiaoling Li, Yuan Jiao

2010 Second International Workshop on Education Technology and Computer Science > 1 > 483 - 486

2010 2nd International Workshop on Education Technology and Computer Science (ETCS)

Classification and clustering are frequently-used methods in data excavation technology. This paper introduces the idea of text clustering into the categorization algorithm study. The authors also attempt to use the text categorization pattern of self'-initiated learning to design a clustering-based text categorization algorithm, in the purpose of reducing the dimension of training set and raising...

chapter

A new topic-bridged model for transfer learning

Meng-Sung Wu, Jen-Tzung Chien

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5346 - 5349

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

In real-world information systems, there are abundant unlabeled data but sparse labeled data. It is challenging to construct an adaptive model to classify a large amount of documents containing different domains. The classifiers trained from a source domain shall perform poorly for the test data in a target domain due to the domain mismatch. In this study, we build a topic-bridged latent Dirichlet...

chapter

Commodity Classification in Hierarchies

Jie Shen, Cang Chen, Ying Gao

2009 International Conference on Wireless Networks and Information Systems > 267 - 269

2009 International Conference on Wireless Networks and Information Systems (WNIS 2009)

In e-commerce transactions, goods are classified according to the hierarchical structure, which refers to a tree category. In the process of classification, we shall consider the special features. While using brand name for category, for instance, the degree of distinction characteristic of brand is higher. Based on this, we prepare a dictionary of brands for Chinese words segamentatin on one hand...

chapter

The Application Research of Topic Word List In Text Automatic Classification

Huan Huang, Qingtang Liu, Linjing Wu, Tao Huang, more

2009 Second International Symposium on Knowledge Acquisition and Modeling > 2 > 111 - 114

2009 Second International Symposium on Knowledge Acquisition and Modeling (KAM 2009)

When the traditional text classification technologies classify academic dissertations, the dimension of extracted feature terms is high, and they can't represent the theme of thesis. it makes the efficiency is very low and the accuracy rate is not high. The topic words are small in quantity and can reflect the theme of thesis well. Accordingly, the paper proposes to extract the topic words with topic...

chapter

A Text Classification Method with an Effective Feature Extraction Based on Category Analysis

Yun Li, Yan Sheng, Luan Luan, Ling Chen

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery > 1 > 95 - 99

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009)

Text classification refers to determine the class of an unknown text according to its content in the given classification system. In order to extract fewer features to express the information in the text as much as possible, the paper analysis the various features' statistical properties and to extract the global features according to Zipf's law; and then, based on the statistical analysis of the...

chapter

Hierarchical Text Categorization Based on Multiple Feature Selection and Fusion of Multiple Classifiers Approaches

Mei-ying Jia, De-quan Zheng, Bing-ru Yang, Qing-xuan Chen

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery > 1 > 192 - 196

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009)

Hierarchical text categorization refers to assigning of one or more suitable category from a hierarchical category space to a document. In this paper, we used hierarchical feature selection method and multiple classifiers for the Hierarchical text categorization task. Experiments showed that the methods we used was effective, compared with flat classification, top-down level-based approach with the...

chapter

Categorization of news articles using neural text categorizer

Taeho Jo

2009 IEEE International Conference on Fuzzy Systems > 19 - 22

2009 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

This research proposes the application of NTC (neural text categorizer) for categorizing news articles. Even if the research on text categorization has been progressed very much, documents should be still encoded into numerical vectors. Encoding so causes the two main problems: huge dimensionality and sparse distribution. The idea of this research as the solution to the problems is to encode documents...

chapter

Automatic Tamil Content Generation

S. Kohilavani, T. Mala, T.V. Geetha

2009 International Conference on Intelligent Agent&Multi-Agent Systems > 1 - 6

2009 International Conference on Intelligent Agent & Multi-Agent Systems (IAMA 2009)

Automatic content generation aims on developing an intelligent tutoring system in Tamil language. This system focuses on delivering personalized content in Tamil language to an individual user needs based on their learning abilities and interests. This paper deals with automatic classification of Tamil documents and also the information extraction from those documents to construct the knowledge base...

chapter

Towards More Effective Text Summarization Based on Textual Association Networks

Yuhui Tao, Shuigeng Zhou, Wai Lam, Jihong Guan

2008 Fourth International Conference on Semantics, Knowledge and Grid > 235 - 240

2008 Fourth International Conference on Semantics, Knowledge and Grid (SKG)

This paper proposes new text summarization approaches based on textual unit association networks. Textual units refer to words, phrases, sentences, or paragraphs. Intuitively, textual units containing much co-occurrence information are semantically more salient in a document. We construct two kinds of textual association networks, namely, word-based association network and sentence-based association...

chapter

A Novel Hybrid system for Large-Scale Chinese Text Classification Problem

Zhong Gao, Guanming Lu, Daquan Gu

2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology > 121 - 124

2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology

Most of the Chinese text classification systems are all based on the technology of bag of words (BW) which is a valid probability tool for text representation and can provide a better semantic architecture. But the weakness in classification accuracy is still unconquerable. Support vector machine (SVM) has become a popular classification tool and can be applied in the scheme, but the main disadvantages...

chapter

A Feature Selection Method based on Improved TFIDF

Wei Yong-qing, Liu Pei-yu, Zhu Zhen-fang

2008 Third International Conference on Pervasive Computing and Applications > 1 > 94 - 97

2008 Third International Conference on Pervasive Computing and Applications (ICPCA08)

Feature selection is a valid method to reduce the dimension of vector in text categorization system. After analyzed several common evaluation functions for feature selection, we applied terms weight function to feature selection. A new evaluation function based on improved TFIDF method is presented; in this function the category information is introduced to feature items, and the feature items of...

chapter

Using Linguistic Information to Classify Portuguese Text Documents

T. Goncalves, P. Quaresma

2008 Seventh Mexican International Conference on Artificial Intelligence > 94 - 100

2008 Seventh Mexican International Conference on Artificial Intelligence (MICAI)

This paper examines the role of various linguistic structures on text classification applying the study to the Portuguese language. Besides using a bag-of-words representation where we evaluate different measures and use linguistic knowledge for term selection, we do several experiments using syntactic information representing documents as strings of words and strings of syntactic parse trees. To...

chapter

Using SentiWordNet for multilingual sentiment analysis

K. Denecke

2008 IEEE 24th International Conference on Data Engineering Workshop > 507 - 512

2008 IEEE 24th International Conference on Data Engineering Workshop (ICDE Workshop)

This paper introduces a methodology for determining polarity of text within a multilingual framework. The method leverages on lexical resources for sentiment analysis available in English (SentiWordNet). First, a document in a different language than English is translated into English using standard translation software. Then, the translated document is classified according to its sentiment into one...

Keywords:
WORD PROCESSING

Publication date

Set your own date range

INFONA - science communication portal

Search results

Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification

Job Opportunity Mining by Text Categorization

Chinese Web Text Classification System Model Based on Naive Bayes

Determination of Bloom's cognitive level of question items using artificial neural network

LJParser: LING-JOIN web search & text mining development platform

Text2arff: Automatic feature extraction software for Turkish texts

Applying latent semantic analysis to classify emotions in Thai text

Text Categorization Research Based on Cluster Idea

A new topic-bridged model for transfer learning

Commodity Classification in Hierarchies

The Application Research of Topic Word List In Text Automatic Classification

A Text Classification Method with an Effective Feature Extraction Based on Category Analysis

Hierarchical Text Categorization Based on Multiple Feature Selection and Fusion of Multiple Classifiers Approaches

Categorization of news articles using neural text categorizer

Automatic Tamil Content Generation

Towards More Effective Text Summarization Based on Textual Association Networks

A Novel Hybrid system for Large-Scale Chinese Text Classification Problem

A Feature Selection Method based on Improved TFIDF

Using Linguistic Information to Classify Portuguese Text Documents

Using SentiWordNet for multilingual sentiment analysis

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options