Search results

Items from 81 to 100 out of 116 results

chapter

Research on Medical Document Categorization

Qirui Zhang, Yonggang Xue, Huaying Zhou, Jinghua Tan

2008 International Seminar on Future BioMedical Information Engineering > 437 - 440

2008 International Seminar on Future Biomedical Information Engineering (FBIE 2008)

Medical document categorization is the process of automatically assigning one or more predefined category labels to medical documents. Document indexing plays a very important role in the process of classification. This paper proposes an improved method of computing term weights which is called tfidfie (term frequency, inverted document frequency and inverted entropy). In comparison with the tfidf...

chapter

An Approach to Extracting Central URLs on Catalog Page

He Bai, JinLin Wang, Ye Li

2008 International Symposium on Knowledge Acquisition and Modeling > 388 - 392

2008 International Symposium on Knowledge Acquisition and Modeling (KAM)

Catalog pages construct the intermediate layer in architecture of a standard Web site; therefore research on information retrieval for this kind of pages can be beneficial to improve Web crawler's efficiency. A page is called "catalog-style" if its main body is displayed as a sequence of regular entries, and the central link in each entry apparently contains the pagepsilas major information...

chapter

Automatic Identification of Stop Words in Chinese Text Classification

Lili Hao, Lizhu Hao

2008 International Conference on Computer Science and Software Engineering > 1 > 718 - 722

2008 International Conference on Computer Science and Software Engineering (CSSE 2008)

Text classification is an active research area in information retrieval and natural language processing. A fundamental tool in text classification is a list of 'stop' words(stop word list) that is used to identify frequent words that are unlikely to assist in classification and hence are deleted during pre-processing. Till now, many stop word lists have been developed for English language. However,...

chapter

One-against-one fuzzy support vector machine text categorization classifier

H.M. Chiang, T.Y. Wang

2008 IEEE International Conference on Industrial Engineering and Engineering Management > 1519 - 1523

2008 IEEE International Conference on Industrial Engineering and Engineering Management

The growth of the internet information delivery has made automatic text categorization essential. This investigation explores the challenges of multi-class text categorization using one-against-one fuzzy support vector machine with Reuter??s news as the example data. While the fuzzy set theory is incorporated into the OAO-SVM in the classifying module, the influence of the samples with high uncertainty...

chapter

Tailoring Taxonomies for Efficient Text Categorization and Expert Finding

R. Wetzker, W. Umbrath, L. Hennig, C. Bauckhage, more

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 3 > 459 - 462

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Automatic content categorization by means of taxonomies is a powerful tool for information retrieval and search technologies as it improves the accessibility of data both for humans and machines. While research on automatic categorization has mainly focused on the problem of classifier design, hardly any effort has been spent on the optimization of the taxonomy size itself. However, taxonomy tailoring...

chapter

Two-Stage Model for Information Filtering

Xujuan Zhou, Yuefeng Li, P. Bruza, Yue Xu, more

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 3 > 685 - 689

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

This thesis presents a novel two-stage model that integrates the theories and techniques from the fields of information retrieval/filtering (IR/IF)and the fields of machine learning and data mining to provide more precise document filtering and retrieval. The first stage is topic filtering. The topic filtering stage is intended to minimize information mismatch by filtering out the most likely irrelevant...

chapter

Research on Ontology-Based Text Clustering

XiQuan Yang, DiNa Guo, XueYa Cao, JianYuan Zhou

2008 Third International Workshop on Semantic Media Adaptation and Personalization > 141 - 146

2008 Third International Workshop on Semantic Media Adaptation and Personalization

Text clustering as a method of organizing retrieval results can organize large amounts of web search into a small number of clusters in order to facilitate users?? quickly browsing. In this paper, we propose a text clustering method based on ontology which is different from traditional text clustering and can improve clustering results performance. This method implements word clustering by calculating...

chapter

TFIDF, LSI and multi-word in information retrieval and text categorization

Wen Zhang, T. Yoshida, Xijin Tang

2008 IEEE International Conference on Systems, Man and Cybernetics > 108 - 113

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Text representation, which is a fundamental and necessary process for text-based intelligent information processing, includes the tasks of determining the index terms for documents and producing the numeric vectors corresponding to the documents. In this paper, multi-word, which is regarded as containing more contextual semantics than individual word and possessing the favorable statistical characteristics,...

chapter

Information Extraction, Search, Interaction and Collaboration on the Web in Mexico

J.A. Sanchez, E. Chavez, M. Montes

2008 Latin American Web Conference > 156 - 164

2008 Latin American Web Conference (LA-WEB)

Web research in Mexico has been addressing issues related mainly to search mechanisms, information extraction, and mediating user interaction and group collaboration. In this paper we provide an overview of representative projects in the area and present a sample of recent advances by research groups in Mexican institutions. These include initiatives aimed to exploring extraction techniques that regard...

chapter

Feature Selection Method of Text Tendency Classification

Yanling Li, Guanzhong Dai, Gang Li

2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery > 2 > 34 - 37

2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Recently, automatic text categorization has made rapid progress and been one of the hotspots in the information processing field. Text tendency classification is one type of text categorization, which has very important applications in information retrievals bad information identification and filtering , content security management and analysis of public opinion tendency. To aim at the important influence...

chapter

Automatic Information Extraction in Semi-structured Official Journals

V.M. Filho, R.B.C. Prudencio, F.A.T. de Carvalho, L.R. Torres, more

2008 10th Brazilian Symposium on Neural Networks > 51 - 56

2008 10th Brazilian Symposium on Neural Networks

Information extraction systems are used to extract only relevant text information in digital repositories. The current work proposes an automatic system to extract information in semi-structured official journals. In our approach, given an input document, a Machine Learning (ML) algorithm classifies the documentpsilas fragments into class labels which correspond to the data fields to be extracted...

chapter

A Text Feature Selection Algorithm Based on Improved TFIDF

Chengcheng Yang, Xingshi He

2008 Chinese Conference on Pattern Recognition > 1 - 4

2008 Chinese Conference on Pattern Recognition

In Chinese text categorization system, for most classifiers using vector space model (VSM), all attributes of documents construct a high dimensional feature space. And the high dimensionality of feature space is the bottleneck of categorization. TFIDF is a kind of common methods used to measure the terms in a document. The method is easy but it doesn't consider the unbalance distribution of terms...

chapter

Topic generation for web document summarization

Heng-Yao Hsu, Chun-Wei Tsai, Ming-Chao Chiang, Chu-Sing Yang

2008 IEEE International Conference on Systems, Man and Cybernetics > 3702 - 3707

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Over the past decade, more and more users of the Internet rely on the search engines to help them find the information they need. However, the information they find depends, to a large extent, on the ranking mechanism of the search engines they use. Not surprisingly, it, in general, consists of a large amount of information that is completely irrelevant. To help users of the Internet find the information...

chapter

CCPR 2008 Keynote Speech 2

Chin-Hui Lee

2008 Chinese Conference on Pattern Recognition > 1

2008 Chinese Conference on Pattern Recognition

With an increasing amount of audio and video materials made available on the web, information extraction from multimedia documents is becoming a key area of growing business and technology interest. Research opportunities range from traditional topics, such as multimedia signal representation, processing, coding, modeling, authentication, and recognition, to emerging subjects, such as language modeling,...

chapter

Hidden Markov Models and Text Classifiers for Information Extraction on Semi-Structured Texts

F.A. Barros, E.F.A. Silva, R.B.C. Prudencio, V.M. Filho, more

2008 Eighth International Conference on Hybrid Intelligent Systems > 417 - 422

2008 8th International Conference on Hybrid Intelligent Systems (HIS)

Information extraction (IE) aims to extract from textual documents only the fragments which correspond to datafields required by the user. In this paper, we present new experiments evaluating a hybrid machine learning approach for IE that combines text classifiers and hidden Markov models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM,...

chapter

News Keyword Extraction for Topic Tracking

Sungjick Lee, Han-Joon Kim

2008 Fourth International Conference on Networked Computing and Advanced Information Management > 2 > 554 - 559

2008 Fourth International Conference on Networked Computing and Advanced Information Management (NCM)

This paper presents a keyword extraction technique that can be used for tracking topics over time. In our work, keywords are a set of significant words in an article that gives high-level description of its contents to readers. Identifying keywords from a large amount of on-line news data is very useful in that it can produce a short summary of news articles. As on-line text documents rapidly increase...

chapter

Measuring the representativeness of index terms in literary texts: an experiment on the Quran

Hayati Abd Rahman,, Shahrul Azman Noah,, Hector Jimenez-Salazar

2008 International Symposium on Information Technology > 2 > 1 - 5

2008 International Symposium on Information Technology

Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. One of the important tasks in the construction of concept hierarchy is the identification of suitable terms with appropriate size of domain vocabulary. One way of achieving such a size is by using...

chapter

Semantic Foraging in Defined Contexts

D. Carmichael, B. Swart

2008 10th IEEE Conference on E-Commerce Technology and the Fifth IEEE Conference on Enterprise Computing, E-Commerce and E-Services > 445 - 452

10th IEEE Conference on E-Commerce Technology (CEC'08) and the Fifth IEEE Conference on Enterprise Computing, E-Commerce and E-Services (EEE'08)

An experimental prototype system was created and used to investigate how information relevant to analyst queries, and constrained by a contextual model, can be found over a large information space. Agents employing the ant model sift through documents quickly using a transductive support machine classifier and return those meeting a classifier which is constantly refined through feedback from semantic...

chapter

Cross-subject page ranking based on text categorization

Jianmei Huang, Guoren Wang, Zhiqiong Wang

2008 International Conference on Information and Automation > 363 - 368

2008 International Conference on Information and Automation (ICIA)

With the development of Internet, there are enormous web pages in the Internet. So the good page ranking algorithm is critical for users to gain positive results. The traditional ranking method is suitable for general search engine, but not for the focused search engine and the search engine based on categorization. With state of the art in text categorization, so many cross-subjects appear, and the...

chapter

Ambiguity in text mining

H.M. Al Fawareh, S. Jusoh, W.R.S. Osman

2008 International Conference on Computer and Communication Engineering > 1172 - 1176

2008 International Conference on Computer and Communication Engineering

Text Mining tasks include text categorization, text clustering, concept/entity extraction, document summarization, and entity relation modeling. In this paper, the focus is given to concept/entity extraction only. The major challenging issue in extracting concept/entity from texts is natural language words are always ambiguous. Up to now, not much research in text mining especially in concept/entity...

Keywords:
INFORMATION RETRIEVAL
Publication type:
book

Publication date

Set your own date range

INFONA - science communication portal

Search results

Research on Medical Document Categorization

An Approach to Extracting Central URLs on Catalog Page

Automatic Identification of Stop Words in Chinese Text Classification

One-against-one fuzzy support vector machine text categorization classifier

Tailoring Taxonomies for Efficient Text Categorization and Expert Finding

Two-Stage Model for Information Filtering

Research on Ontology-Based Text Clustering

TFIDF, LSI and multi-word in information retrieval and text categorization

Information Extraction, Search, Interaction and Collaboration on the Web in Mexico

Feature Selection Method of Text Tendency Classification

Automatic Information Extraction in Semi-structured Official Journals

A Text Feature Selection Algorithm Based on Improved TFIDF

Topic generation for web document summarization

CCPR 2008 Keynote Speech 2

Hidden Markov Models and Text Classifiers for Information Extraction on Semi-Structured Texts

News Keyword Extraction for Topic Tracking

Measuring the representativeness of index terms in literary texts: an experiment on the Quran

Semantic Foraging in Defined Contexts

Cross-subject page ranking based on text categorization

Ambiguity in text mining

Filter options

Publication date

Content availability

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options