Search results

Items from 1 to 7 out of 7 results

chapter

Text Clustering Based on Domain Ontology and Latent Semantic Analysis

Yaxiong Li, Jianqiang Zhang, Dan Hu

2010 International Conference on Asian Language Processing > 219 - 222

2010 International Conference on Asian Language Processing (IALP 2010)

One key step in text mining is the categorization of texts, i.e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated meaning in texts. This paper ushers in domain ontology and constructs new conceptual vector space model in...

chapter

A refined weighted K-Nearest Neighbors algorithm for text categorization

Fang Lu, Qingyuan Bai

2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering > 326 - 330

2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2010)

Text categorization is one important task of text mining, for automated classification of large numbers of documents. Many useful supervised learning methods have been introduced to the field of text classification. Among these useful methods, K-Nearest Neighbor (KNN) algorithm is a widely used method and one of the best text classifiers for its simplicity and efficiency. For text categorization,...

chapter

Internet Public Opinion Hotspot Detection and Analysis Based on Kmeans and SVM Algorithm

Hong Liu

2010 International Conference of Information Science and Management Engineering > 1 > 257 - 261

2010 International Conference of Information Science and Management Engineering. ISME 2010

Rapid progress of network arouses much attention on Internet public opinion, it is important to grasp the internet public opinion in time and understand the trends of their opinion correctly. Text mining plays a fundamental role in categorization and monitoring of internet public opinion, but internet public opinion is much more difficult than pure-text process because of their semi-structured characteristic...

chapter

Using top n Recognition Candidates to Categorize On-line Handwritten Documents

S.P. Saldarriaga, E. Morin, C. Viard-Gaudin

2009 10th International Conference on Document Analysis and Recognition > 881 - 885

2009 10th International Conference on Document Analysis and Recognition (ICDAR)

The traditional weighting schemes used in text categorization for the vector space model (VSM) cannot exploit information intrinsic to texts obtained through online handwriting recognition or any OCR process. Especially, top n (n > 1) recognition candidates could not be used without flooding the resulting text with false occurrences of spurious terms. In this paper, an improved weighting scheme...

chapter

Categorization and Monitoring of Internet Public Opinion Based on Latent Semantic Analysis

Yuan Wan, Hengqing Tong

2008 International Seminar on Business and Information Management > 2 > 121 - 124

2008 International Seminar on Business and Information Management (ISBIM 2008)

Rapid progress of network arouses much attention on Internet public opinion. To address this issue, we propose a novel system for categorization and monitoring of Internet public opinion. Due to the text format of Internet public opinion and the semantic relationship between words in such documents, we introduce latent semantic analysis (LSA) to represent document of public opinion. Compared to the...

chapter

A novel feature weight algorithm for text categorization

Wenqian Shang, Hongbin Dong, Haibin Zhu, Yongbin Wang

2008 International Conference on Natural Language Processing and Knowledge Engineering > 1 - 7

2008 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE)

With the development of the Web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on vector space model (VSM). The result of text preprocessing...

chapter

TFIDF, LSI and multi-word in information retrieval and text categorization

Wen Zhang, T. Yoshida, Xijin Tang

2008 IEEE International Conference on Systems, Man and Cybernetics > 108 - 113

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Text representation, which is a fundamental and necessary process for text-based intelligent information processing, includes the tasks of determining the index terms for documents and producing the numeric vectors corresponding to the documents. In this paper, multi-word, which is regarded as containing more contextual semantics than individual word and possessing the favorable statistical characteristics,...

Filter options

Keywords:
DATA MINING
VECTOR SPACE MODEL

Publication date

Set your own date range

Keywords

CLASSIFICATION ALGORITHMS (4)
INTERNET (3)
SUPPORT VECTOR MACHINE CLASSIFICATION (3)
SUPPORT VECTOR MACHINES (3)
TEXT MINING (3)
VECTORS (3)
FEATURE EXTRACTION (2)
FREQUENCY MEASUREMENT (2)
INDEXES (2)
INTERNET PUBLIC OPINION (2)
LATENT SEMANTIC ANALYSIS (2)
MACHINE LEARNING (2)
MATRIX DECOMPOSITION (2)
PATTERN CLUSTERING (2)
TEXT CLUSTERING (2)
TRAINING (2)
-INTERNET PUBLIC OPINION (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ANALYTICAL MODELS (1)
CLASSIFICATION (1)
CLUSTERING ALGORITHMS (1)
CLUSTERING ANALYSIS (1)
CONCEPT-TEXT MATRIX (1)
CONTEXTUAL SEMANTICS (1)
DOCUMENT CLASSIFICATION (1)
DOCUMENT IMAGE PROCESSING (1)
DOMAIN ONTOLOGY (1)
ENTROPY (1)
FEATURE WEIGHT ALGORITHM (1)
HANDWRITING RECOGNITION (1)
HANDWRITTEN CHARACTER RECOGNITION (1)
HOTSPOT ANALYSIS (1)
HOTSPOT DETECTION (1)
INDEXING (1)
INFORMATION RETRIEVAL (1)
K-NEAREST NEIGHBORS ALGORITHM (1)
KMEANS ALGORITHM (1)
KMEANS CLUSTERING (1)
KNN (1)
LARGE SCALE INTEGRATION (1)
LATENT SEMANTIC ANALYSI (1)
LATENT SEMANTIC INDEXING (1)
LEARNING (ARTIFICIAL INTELLIGENCE) (1)
LEXICON-TEXT MATRIX (1)
LSI (1)
MATRIX ALGEBRA (1)
MONITORING (1)
MULTI-WORD (1)
NOISE (1)
NOISE MEASUREMENT (1)
NOISY TEXT CATEGORIZATION (1)
OCR PROCESS (1)
ONLINE HANDWRITTEN DOCUMENT CATEGORIZATION (1)
ONTOLOGIES (1)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (1)
OPTICAL CHARACTER RECOGNITION (1)
PATTERN CLASSIFICATION (1)
POSTERIOR PROBABILITY (1)
PROBABILITY (1)
PROBABILITY DENSITY FUNCTION (1)
PUBLIC OPINION DOCUMENT (1)
RECOGNITION CANDIDATE (1)
RECOGNITION CANDIDATES (1)
RESCALING FACTOR (1)
SEMANTICS (1)
SOCIAL ASPECTS OF AUTOMATION (1)
SOCIAL SCIENCES COMPUTING (1)
SUPERVISED LEARNING METHOD (1)
SVM (1)
SVM ALGORITHM (1)
SVM CLASSIFIER (1)
TERM FREQUENCY INVERSE DOCUMENT FREQUENCY (1)
TEXT CLASSIFICATION (1)
TEXT OPINION ANALYSIS (1)
TEXT PREPROCESSING (1)
TEXT RECOGNITION (1)
TEXT REPRESENTATION (1)
TEXT-BASED INTELLIGENT INFORMATION PROCESSING (1)
TF*IDF (1)
TF-GINI (1)
TFIDF (1)
TRADITIONAL INDEXING METHODS (1)
WEB SITE (1)
WEBSITE (1)
WEIGHT CALCULATION (1)
WEIGHT MEASUREMENT (1)
WEIGHTING SCHEME (1)
WORLD WIDE WEB (1)
more

INFONA - science communication portal

Search results

Text Clustering Based on Domain Ontology and Latent Semantic Analysis

A refined weighted K-Nearest Neighbors algorithm for text categorization

Internet Public Opinion Hotspot Detection and Analysis Based on Kmeans and SVM Algorithm

Using top n Recognition Candidates to Categorize On-line Handwritten Documents

Categorization and Monitoring of Internet Public Opinion Based on Latent Semantic Analysis

A novel feature weight algorithm for text categorization

TFIDF, LSI and multi-word in information retrieval and text categorization

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options