Search results

Items from 21 to 40 out of 64 results

chapter

An automatic indexing technique for Thai texts using frequent max substring

T. Chumwatana, Kok Wai Wong, Hong Xie

2009 Eighth International Symposium on Natural Language Processing > 67 - 72

2009 Eighth International Symposium on Natural Language Processing. SNLP 2009

Thai language is considered as a non-segmented language where words are a string of symbols without explicit word boundaries, and also the structure of written Thai language is highly ambiguous. This problem causes an indexing technique has become a main issue in Thai text retrieval. To construct an inverted index for Thai texts, an index terms extraction technique is usually required to segment texts...

chapter

Extracting historical terms based on aligned Chinese-English parallel corpora

Xiuying Li, Chao Che, Limin Han, Xiaoxia Liu

2009 International Conference on Natural Language Processing and Knowledge Engineering > 1 - 6

2009 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE)

This paper examines the feasibility of implementing statistic-oriented term extraction and evaluation methods in extracting historical terms from aligned parallel corpora of Chinese historical classics and their translations. It proposes to take transliteration as anchor points to establish sentence-level alignment. It also investigates the approach to extract term translation pairs based on 4000...

chapter

Entity relation extraction to free text

Suxiang Zhang

2009 International Conference on Natural Language Processing and Knowledge Engineering > 1 - 5

2009 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE)

A novel approach of the entity relation extraction is proposed by this paper, it is different from the previous approaches, and the syntactic knowledge extraction is specific section, which automatically extracts the characteristic words and patterns based on hierarchy bootstrapping machine learning. It advocates using a small amount of seed information and a large collection of easily-obtained unlabeled...

chapter

Chinese text orientation analysis based on phrase

Ye Guo, Yanquan Zhou

2009 International Conference on Natural Language Processing and Knowledge Engineering > 1 - 6

2009 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE)

Semantic orientation analysis of sentiment word is to determine its polarity and degree, including original orientation, dynamic orientation and modified orientation. In this paper, we correct the orientation in different contexts through dependency relationship and some rules. The result shows that accuracy and recall rate is improved a lot.

chapter

Theme detection an exploration of opinion subjectivity

A. Das, S. Bandyopadhyay

2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops > 1 - 6

2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII 2009)

Work in opinion mining and classification often assumes the incoming documents to be opinionated. Opinion mining system makes false hits while attempting to compute polarity values for non-subjective or factual sentences or documents. It becomes imperative to decide whether a given document contains subjective information or not as well as to identify which portions of the document are subjective...

chapter

Text classification in the Turkish marketing domain for context sensitive ad distribution

Melih Engin, T. Can

2009 24th International Symposium on Computer and Information Sciences > 105 - 110

2009 24th International Symposium on Computer and Information Sciences (ISCIS)

In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish Web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of Web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply...

chapter

Generic text summarization for Turkish

C. Cigir, M. Kutlu, I. Cicekli

2009 24th International Symposium on Computer and Information Sciences > 224 - 229

2009 24th International Symposium on Computer and Information Sciences (ISCIS)

In this paper, we propose a generic text summarization method that generates summaries of Turkish texts by ranking sentences according to their scores calculated using their surface level features and extracting the highest ranked ones from the original documents. In order to extract sentences which form a summary with an extensive coverage of main content of the text and less redundancy, we use the...

chapter

Basics of Concepts Representation for Document Summarization

S.C. Suh, S.I. Saffer, S.G. Anaparthi, N.M. Sirakov

2009 Fifth International Joint Conference on INC, IMS and IDC > 1374 - 1380

2009 Fifth International Joint Conference on INC, IMS and IDC

This paper presents three different ways to describe the notion concept. The first one uses the idea of hierarchy and employs a graph to define the connections between attributes and concepts. To enable concepts generation, manipulation or measurement a matrix model is developed. Thus, the entire space of terms could be generated by a set of (linearly) independent terms over a numerical field. The...

chapter

Enhanced Algorithm for Extracting the Root of Arabic Words

S. Ghwanmeh, G. Kanaan, R. Al-Shalabi, S. Rabab'ah

2009 Sixth International Conference on Computer Graphics, Imaging and Visualization > 388 - 391

2009 Sixth International Conference on Computer Graphics, Imaging and Visualization (CGIV 2009)

Stemming is one of many tools used in information retrieval to combat the vocabulary mismatch problem, in which query words do not match document words. Stemming in the Arabic language does not fit into the usual mold, because stemming in most research in other languages so far depends only on eliminating prefixes and suffixes from the word, but Arabic words contain infixes as well. In this paper...

chapter

A simplified application of regular expressions: With the extraction of Chinese cultural terms as an example

Yao Zhenjun, Ji Xiangyu

2009 ISECS International Colloquium on Computing, Communication, Control, and Management > 1 > 439 - 442

2009 ISECS International Colloquium on Computing, Communication, Control, and Management (CCCM)

This article aims to solve the problem of extracting the cultural terms and their correspondent English translations from the heterogeneous literature of the translation of the ancient Chinese classics. As the tool of text processing, regular expressions can help to realize the matching in the patterned text. This research focuses on design the target-oriented regular expressions to fit the pattern...

chapter

Automatic extraction of definitions

Chunxia Zhang, Peng Jiang

2009 2nd IEEE International Conference on Computer Science and Information Technology > 364 - 368

2009 2nd IEEE International Conference on Computer Science and Information Technology (ICCSIT 2009)

The task of definition extraction aims to acquire definitions of terms from texts. This task is a subtask of terminology extraction, ontology construction, semantic relation learning, and question answering and so on. This paper presents a bootstrapping approach to automatic extracting definitions of domain-specific terms from unannotated Chinese free texts. Experimental results in three domains of...

chapter

Matchmaking Using Natural Language Descriptions: Linking Customers with Enterprise Service Descriptions

J. Geldart, W. Song, Yang Li

2009 33rd Annual IEEE International Computer Software and Applications Conference > 2 > 376 - 379

2009 33rd Annual IEEE International Computer Software and Applications Conference (COMPSAC 2009)

A novel architecture is presented for the matching of Web-services based on the extraction of interpretation graphs from natural language text. The graphs of candidate services are compared to that of the query using a numerical node-node similarity calculation based on the structure of the graphs. The similarity score of their best alignment with the query may then be used to rank the candidates.

chapter

Automatic topic detection strategy for information retrieval in spoken document

Shan Jin, H. Misra, T. Sikora, J. Jose

2009 10th Workshop on Image Analysis for Multimedia Interactive Services > 300 - 303

2009 10th Workshop on Image Analysis for Multimedia Interactive Services. WIAMIS 2009

This paper suggests an alternative solution for the task of spoken document retrieval (SDR). The proposed system runs retrieval on multi-level transcriptions (word and phone) produced by word and phone recognizers respectively, and their outputs are combined. We propose to use latent Dirichlet allocation (LDA) model for capturing the semantic information on word transcription. The LDA model is employed...

chapter

Word stretching for effective segmentation and classification of historical Arabic handwritten documents

Z. Al Aghbari, S. Brook

2009 Third International Conference on Research Challenges in Information Science > 217 - 224

2009 Third IEEE International Conference on Research Challenges in Information Science (RCIS)

Recently, there is a growing need to access historical Arabic handwritten manuscripts (HAH manuscripts) that are stored in large archives; therefore, managing tools for automatic searching, indexing, classifying and retrieval of HAH manuscripts are required. The peculiar characteristics of Arabic handwriting have added an extra challenging dimension in developing such systems. This paper presents...

chapter

The Research of Chinese Automatic Word Segmentation In Hierarchical Model Dictionary Binary Tree

Luo XianGang, Luo Jin, Xie Zhong

2009 First International Workshop on Database Technology and Applications > 321 - 324

2009 First International Workshop on Database Technology and Applications, DBTA

With the continuous development and growing popularity of the Internet, the amount of information on-line is in the explosive growth. How to find out the information that we need correctly and quickly from the mass data, then put in the front. Under this background, the Internet search engine grows up rapidly. This article describes the search engine on the general principle and common technology,...

chapter

An Anaphora Based Information Retrieval Model Extension

F. Santiago do Carmo Pereira, H. Seibel Junior, S.A.A. de Freitas

2009 WRI World Congress on Computer Science and Information Engineering > 4 > 330 - 334

2009 WRI World Congress on Computer Science and Information Engineering, CSIE

Classical information retrieval models are based on representation of document terms without considering linguistic elements. This article presents a model based on the Discourse Nominal Structure; which lets us take linguistic characteristics of text into account. The model presented is evaluated in comparison with the vector space model. Based on observations during the experimentation we propose...

chapter

Information Extraction Using Link Grammar

N. Zamin

2009 WRI World Congress on Computer Science and Information Engineering > 5 > 149 - 153

2009 WRI World Congress on Computer Science and Information Engineering, CSIE

In the last few years, information extraction (IE) has become a rapidly expanding field as the machine-readable documents keep growing exponentially. IE is the perfect solution to transform factual knowledge from publications into database entries. Many efforts have been made to automatically extract and mine scientific texts ranging from biochemical to terrorism attacks reports. This study is looking...

chapter

A Novel Algorithm for Normalizing Noisy Arabic Text

E.T. Al-Shammari

2009 WRI World Congress on Computer Science and Information Engineering > 4 > 477 - 482

2009 WRI World Congress on Computer Science and Information Engineering, CSIE

In this paper, an algorithm to normalize noisy text, which only focuses on the Arabic language, is introduced. Although there have been many theories that discuss Arabic text processing, there has not been, so far, one theory that focuses on noisy Arabic texts. Additionally, this paper introduces a new similarity measure to stem Arabic noisy document. The need for such a new measure stems from the...

chapter

A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool

D. Bhattacharyya, P. Das, D. Ganguly, K. Mitra, more

2009 International e-Conference on Advanced Science and Technology > 24 - 29

2009 International e-Conference on Advanced Science and Technology (AST 2009)

Extracting specific information from a collection of documents is called information extraction (IE). In general, the information on the a Web is well structured in HTML or XML format. And the work of IE from structured documents (in HTML or XML), basically uses learning techniques for pattern matching in the content. In this paper, we have proposed a novel approach for interactive information extraction...

article

Using an Ant Colony Metaheuristic to Optimize Automatic Word Segmentation for Ancient Greek

G. Tambouratzis

IEEE Transactions on Evolutionary Computation > 2009 > 13 > 4 > 742 - 753

Given a text or collection of texts involving unconstrained language, a basic task in a multitude of applications is the identification of stems and endings for each word form, which is termed morphological analysis. In this paper, the use of an ant colony optimization (ACO) metaheuristic is proposed for a linguistic task that involves the automated morphological segmentation of Ancient Greek word...

Data set:
ieee
Keywords:
DATA MINING
NATURAL LANGUAGE PROCESSING
TEXT ANALYSIS
INFORMATION RETRIEVAL

Publication date

Set your own date range

INFONA - science communication portal

Search results

An automatic indexing technique for Thai texts using frequent max substring

Extracting historical terms based on aligned Chinese-English parallel corpora

Entity relation extraction to free text

Chinese text orientation analysis based on phrase

Theme detection an exploration of opinion subjectivity

Text classification in the Turkish marketing domain for context sensitive ad distribution

Generic text summarization for Turkish

Basics of Concepts Representation for Document Summarization

Enhanced Algorithm for Extracting the Root of Arabic Words

A simplified application of regular expressions: With the extraction of Chinese cultural terms as an example

Automatic extraction of definitions

Matchmaking Using Natural Language Descriptions: Linking Customers with Enterprise Service Descriptions

Automatic topic detection strategy for information retrieval in spoken document

Word stretching for effective segmentation and classification of historical Arabic handwritten documents

The Research of Chinese Automatic Word Segmentation In Hierarchical Model Dictionary Binary Tree

An Anaphora Based Information Retrieval Model Extension

Information Extraction Using Link Grammar

A Novel Algorithm for Normalizing Noisy Arabic Text

A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool

Using an Ant Colony Metaheuristic to Optimize Automatic Word Segmentation for Ancient Greek

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options