Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Items from 1 to 20 out of 23 results

chapter

Chinese New Words Extraction Based on Machine Learning Approach

Zi-Ru Zhang, Qiang-Jun Wang, Xue-Dong Tian

2006 International Conference on Machine Learning and Cybernetics > 3380 - 3384

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Chinese new words extraction is an important problem for Chinese information processing. In this paper a new words extraction method based on machine learning is proposed, where the context information, the word construction rules and statistic information are combined to extract new words. An experiment, based on two-character-nouns, shows that this method can well improve the efficiency and accuracy...

chapter

A Divide-Conquer Strategy for English Text Chunking

Ying-Hong Liang, Ni-Hong Wang, Jian-Min Su, Hong-E Ren

2006 International Conference on Machine Learning and Cybernetics > 3370 - 3375

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, the divide-conquer approach is proposed and applied in the identification...

chapter

Toward Unification of Source Attribution Processes and Techniques

F. Khosmood, R. Levinson

2006 International Conference on Machine Learning and Cybernetics > 4551 - 4556

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Automatic source attribution refers to the ability for an autonomous process to determine the source of a previously unexamined piece of writing. Statistical methods for source attribution have been the subject of scholarly research for well over a century. The field, however, is still missing a definitive currency of established or agreed-upon classes of features, methods, techniques and nomenclature...

chapter

Ontology-Based Knowledge Management of Chinese Historical Official Titles: An Overview of the Hotkb Project

Fang Liu, Xing-Lin Yang

2006 International Conference on Machine Learning and Cybernetics > 1454 - 1458

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Knowledge of Chinese historical official titles is stored in paper or in electronic image, and the representations are based on nature language, which is more ambiguous than formal one. So, it is difficult to retrieve and disseminate. This paper reports a method to build a knowledge base that can provide a sharable and reusable knowledge resource about Chinese historical official titles. Consider...

chapter

Suffix Tree Based WEB Information Search System and Optimal Index Algorithms

Lian-Long Wu

2006 International Conference on Machine Learning and Cybernetics > 4450 - 4454

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Chinese information search engines always encounter a difficulty in segmentation of Chinese words from an article. In this paper, a suffix tree based searching approach is proposed to avoid the difficulty in segmentation of Chinese words. The suffix tree algorithms are studied and a set of optimal algorithms for index build are proposed. Based on the algorithms, a prototype of Chinese information...

chapter

Chinese Named Entity Recognition using Support Vector Machines

Xu-Dong Lin, Hong Peng, Bo Liu

2006 International Conference on Machine Learning and Cybernetics > 4216 - 4220

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Named entity recognition (NER) is low-level semantics technology. Since it is simple and efficient, it has been widely applied in many systems such as machine translation, information retrieval, information extraction, question answering and summarization. The goal of named entity recognition is to classify names into some particular categories from text, such as the names of people, places, and organizations...

chapter

An Open Domain Question Answering System Based on Improved System Similarity Model

Yu-Ming Zhao, Zhi-Ming Xu, Yi Guan, Xiao-Long Wang

2006 International Conference on Machine Learning and Cybernetics > 4521 - 4526

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Question-answering has recently received more and more attention from researchers. It is widely regarded as the advanced stage of information retrieval. This paper provides a novel domain-independent question-answering system which is based on information retrieval in a large-scale collection of texts, and an improved system similarity model is developed and applied in it which improves the performance...

chapter

A Survey of Automatic Urdu Language Processing

Waqas Anwar, Xuan Wang, Xiao-Long Wang

2006 International Conference on Machine Learning and Cybernetics > 4489 - 4494

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Most of the research in last few decades has focused on automatic natural language processing (NLP) in English, European and East Asian languages. But unfortunately South Asian languages especially Urdu have received less attention. In this paper we present a survey regarding classification of Urdu language. The main goal of this survey is to present briefly about the material available on Urdu NLP,...

chapter

A Cascaded Approach to the Optimization of Translation Rules

Shu-Jie Liu, Mu-Yun Yang, Tie-Jun Zhao

2006 International Conference on Machine Learning and Cybernetics > 4089 - 4092

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

As far as the rule-based machine translation (RBMT) is concerned, the rule acquisition remains as a bottle-neck problem. This paper proposes a cascaded approach to optimize the rule base, which is automatically acquired from the bilingual corpus. Observing the more risk of errors in the upper layer of the parsing tree, we propose in this paper a method which advocates the optimization of rules by...

chapter

Application of Quotient Space Granularity Computation Theory in Pinyin-Chinese Character Conversion

Zheng-Yi liu, Jian-Cheng Gong, Jian-Guo Wu

2006 International Conference on Machine Learning and Cybernetics > 2670 - 2673

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

At present the most widely used technology of pinyin-Chinese character conversion combines statistics with linguistic rules. Although it basically solves such problems as long distance restriction and language recursion phenomenon, it relies on a great deal of computation because there are too many candidate paths. This paper tries to simplify the candidate paths by using quotient space granularity...

chapter

Integration Algorithm of English-Chinese Word Segmentation and Alignment

Zhi-Ming Xu, Chun-Yu Kit, J.J. Webster

2006 International Conference on Machine Learning and Cybernetics > 4105 - 4110

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

This paper proposes an integration algorithm of English-Chinese word segmentation and alignment. In this algorithm, bilingual word segmentation and alignment work synchronously and interactively. Given sentence-aligned bitext, it cannot only use bilingual word alignment's information to guide resolving word segmentation ambiguities, but also avoid the errors of word segmentation from being transferred...

chapter

A New Recognition Method for the Handwritten Manchu Character Unit

Guang-Yuan Zhang, Jing-Jiao Li, Ai-Xia Wang

2006 International Conference on Machine Learning and Cybernetics > 3339 - 3344

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

The Manchu character recognition method based on Manchu character unit is an efficient method. In this method, the recognition accuracy rate of Manchu character unit has great influence on the final recognition result. As new approach to solve this problem, a hybrid wavelet neural network scheme has developed as a recognition method replaces the original mini-distance method. Both the learning samples...

chapter

Research on Chinese Information Retrieval Based on a Hybrid Language Modeling

De-Quan Zheng, Tie-Jun Zhao, Feng Yu, Sheng Li, more

2006 International Conference on Machine Learning and Cybernetics > 2586 - 2591

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

For information retrieval, users hope to acquire more relevant information from the top indexing documents. In this paper, a combination of ontology with statistical method is presented to retrieval initial document set and improves the precision of top N ranking documents by re-ranking document set. The experiment with NTCIR-3 Chinese CLIR dataset shows the proposed method improved the precision...

chapter

Realizing Target Language Generation in Data-Oriented English-Chinese Machine Translation

Tag Zhang, Rui Feng, Yue-Jie Zhang

2006 International Conference on Machine Learning and Cybernetics > 2624 - 2629

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

This paper presents a kind of target language generation mechanism in data-oriented English-Chinese machine translation. This mechanism applies the theory of data-oriented parsing used in language analysis traditionally into target language generation equally. Through linearizing the result of source language analysis - a parse tree, the final translation in target language is generated. To prove...

chapter

Application-Oriented Comparison and Evaluation of Six Semantic Similarity Measures Based on Wordnet

Peng-Yuan Liu, Tie-Jun Zhao, Xiao-Feng Yu

2006 International Conference on Machine Learning and Cybernetics > 2605 - 2610

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

In the task of auto-building a Chinese-English semantic lexicon for translation selection, this research presents a method, which introduces WordNet similarity measures to wash out misaligned Chinese-English word pairs. Six different proposed measures of similarity based on WordNet were experimentally compared and evaluated by using WordNet and the software package WordNet::Similarity. It was found...

chapter

A Novel Chinese Multi-Document Summarization Using Clustering Based Sentence Extraction

De-Xi Liu, Yan-Xiang He, Dong-Hong Ji, Hua Yang

2006 International Conference on Machine Learning and Cybernetics > 2592 - 2597

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

This paper proposes a strategy for Chinese multi-document summarization based on clustering and sentence extraction. It adopts the term vector to represent the linguistic unit in Chinese document, which obtains higher representation quality than traditional word-based vector space model in a certain extent. As for clustering, we propose two heuristics to automatically detect the proper number of clusters:...

chapter

Evaluation for Liaison of Spoken English: a Sugeno Integration Approach

Qing-Cai Chen, Peng-Fei Su, He-Jiao Huang, Xiao-Long Wang

2006 International Conference on Machine Learning and Cybernetics > 2617 - 2623

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

The liaison evaluation for spoken English is one of the key problems for computer aided spoken language learning. Though a lot of factors affect the performance of a spoken language evaluation algorithm, there are mainly two factors that contribute to the most of the obstacles, i.e. the natural casualness of spoken language and the unstable performance of existing speech processing systems. In this...

chapter

A Fuzzy Pronunciation Evaluation Model for English Learning

Peng-Fei Su, Qing-Cai Chen, Xiao-Long Wang

2006 International Conference on Machine Learning and Cybernetics > 2598 - 2604

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

The evaluation of pronunciation for spoken English is one of the key problems for computer aided spoken language learning. While the most of researchers focus on the improvement of speech recognition to build a reliable evaluation system, there still needs a model that fuses the reliabilities of existing speech processing systems and the learner personalities into the evaluation system. In this paper,...

chapter

Active Learning using Localized Generalization Error for Text Categorization

D.S. Yeung, Y. Zhang, W.W.Y. Ng, Qing-Cai Chen

2006 International Conference on Machine Learning and Cybernetics > 2686 - 2691

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Text categorization is one of the important steps of many applications, e.g. Web page classification, indexing in search engine and information retrieval. When the number of documents available is huge, active learning could help relief the training time and cost. Moreover, active learning is able to filter out noisy samples for training and therefore may achieve better generalization capability....

chapter

Research on Dual Pattern of Unsupervised and Supervised Word Sense Disambiguation

Yao-Feng Wang, Yue-Jie Zhang, Zhi-Ting Xu, Tao Zhang

2006 International Conference on Machine Learning and Cybernetics > 2665 - 2669

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

As an important work in the field of natural language processing, word sense disambiguation (WSD) has been a research focus since 1950. The task of WSD is very difficult to solve, and most of modern algorithms fail to reach an ideal level. The processing for WSD is to determine the sense of a polysemous word within a specific context, which involves two steps - determining all the senses for the polysemous...

Keywords:
NATURAL LANGUAGES

Publication date

Set your own date range

Keywords

TEXT ANALYSIS (8)
COMPUTATIONAL LINGUISTICS (5)
GRAMMARS (5)
LEARNING (ARTIFICIAL INTELLIGENCE) (5)
INFORMATION RETRIEVAL (4)
LANGUAGE TRANSLATION (4)
NATURAL LANGUAGE PROCESSING (4)
WORD PROCESSING (4)
DICTIONARIES (3)
FEATURE EXTRACTION (3)
INTERNET (3)
STATISTICAL ANALYSIS (3)
SUPPORT VECTOR MACHINES (3)
TREE DATA STRUCTURES (3)
CHARACTER RECOGNITION (2)
COMPUTER AIDED INSTRUCTION (2)
COMPUTER AIDED SPOKEN LANGUAGE LEARNING (2)
KNOWLEDGE ACQUISITION (2)
LINGUISTICS (2)
MACHINE TRANSLATION (2)
NAMED ENTITY RECOGNITION (2)
NEURAL NETS (2)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (2)
ONTOLOGY (2)
PATTERN CLASSIFICATION (2)
PATTERN RECOGNITION (2)
SPEECH PROCESSING (2)
SPHINX-4 (2)
SUGENO INTEGRAL (2)
SUPPORT VECTOR MACHINE (2)
TEXT CATEGORIZATION (2)
UNSUPERVISED LEARNING (2)
WORD SEGMENTATION (2)
WORD SIMILARITY (2)
WORDNET (2)
ACTIVE LEARNING (1)
ADAPTED EXTENDED LESK ALGORITHM (1)
ARCHAIC CHINESE DATABASE (1)
ASSOCIATION RULES (1)
AUTHORSHIP ATTRIBUTION (1)
AUTOMATIC CHINESE TEXT CATEGORIZATION (1)
AUTOMATIC SOURCE ATTRIBUTION PROCESS (1)
AUTOMATIC URDU LANGUAGE PROCESSING (1)
BAYES METHODS (1)
BAYESIAN FEATURE SETS (1)
BI-GRAM FEATURE SETS (1)
BILINGUAL CORPUS (1)
BILINGUAL DICTIONARY (1)
BILINGUAL WORD ALIGNMENT (1)
BILINGUAL WORD SEGMENTATION (1)
BOTTOM-UP STRATEGY (1)
CASCADED OPTIMIZATION APPROACH (1)
CHARACTER RECOGNITION METHOD (1)
CHINESE HISTORICAL OFFICIAL TITLES (1)
CHINESE INFORMATION PROCESSING (1)
CHINESE INFORMATION RETRIEVAL (1)
CHINESE INFORMATION SEARCH ENGINES SYSTEM (1)
CHINESE LANGUAGE LEARNER (1)
CHINESE MULTI-DOCUMENT SUMMARIZATION (1)
CHINESE MULTIDOCUMENT SUMMARIZATION (1)
CHINESE NAMED ENTITY RECOGNITION (1)
CHINESE NEW WORD EXTRACTION (1)
CHINESE QUESTION SENTENCE (1)
CHINESE TEXT CATEGORIZATION (1)
CHINESE TEXT CLASSIFICATION (1)
CHINESE WEB TEST COLLECTION (1)
CHINESE-ENGLISH SEMANTIC LEXICON (1)
CHUNK PARSING THEORY (1)
CHUNK SIMILARITY (1)
CLASSIFICATION (1)
COMPUTER TECHNOLOGY (1)
CONTEXT INFORMATION (1)
DATA MINING (1)
DATA SPARSENESS (1)
DATA-ORIENTED ENGLISH-CHINESE MACHINE TRANSLATION (1)
DATA-ORIENTED PARSING (1)
DATA-ORIENTED PARSING THEORY (1)
DATABASE MANAGEMENT SYSTEMS (1)
DICTIONARY (1)
DIVIDE AND CONQUER METHODS (1)
DIVIDE-CONQUER STRATEGY (1)
DOCUMENT HANDLING (1)
DOCUMENT INDEXING (1)
ELECTRONIC IMAGE (1)
ENGLISH LEARNING (1)
ENGLISH PHONEME RECOGNITION (1)
ENGLISH PHRASE IDENTIFICATION (1)
ENGLISH TEXT CHUNKING APPROACH (1)
ENGLISH-CHINESE WORD SEGMENTATION (1)
ERROR STATISTICS (1)
EXAMPLE-BASED MACHINE TRANSLATION (1)
EXTRINSIC EVALUATION (1)
EXTRINSIC EVALUATION METHOD (1)
FEATURE EXTRACTION METHOD (1)
FRAGMENT BANK (1)
FRAGMENT-COMBINATION-FORM BANK (1)
FUZZY ASSOCIATION RULE MINING (1)
FUZZY MEASURE (1)
FUZZY PRONUNCIATION EVALUATION MODEL (1)
more

INFONA - science communication portal

Proceedings of 2006 International Conference on Machine Learning and Cybernetics

Chinese New Words Extraction Based on Machine Learning Approach

A Divide-Conquer Strategy for English Text Chunking

Toward Unification of Source Attribution Processes and Techniques

Ontology-Based Knowledge Management of Chinese Historical Official Titles: An Overview of the Hotkb Project

Suffix Tree Based WEB Information Search System and Optimal Index Algorithms

Chinese Named Entity Recognition using Support Vector Machines

An Open Domain Question Answering System Based on Improved System Similarity Model

A Survey of Automatic Urdu Language Processing

A Cascaded Approach to the Optimization of Translation Rules

Application of Quotient Space Granularity Computation Theory in Pinyin-Chinese Character Conversion

Integration Algorithm of English-Chinese Word Segmentation and Alignment

A New Recognition Method for the Handwritten Manchu Character Unit

Research on Chinese Information Retrieval Based on a Hybrid Language Modeling

Realizing Target Language Generation in Data-Oriented English-Chinese Machine Translation

Application-Oriented Comparison and Evaluation of Six Semantic Similarity Measures Based on Wordnet

A Novel Chinese Multi-Document Summarization Using Clustering Based Sentence Extraction

Evaluation for Liaison of Spoken English: a Sugeno Integration Approach

A Fuzzy Pronunciation Evaluation Model for English Learning

Active Learning using Localized Generalization Error for Text Categorization

Research on Dual Pattern of Unsupervised and Supervised Word Sense Disambiguation

Filter options

Publication date

Keywords

INFONA - science communication portal

Proceedings of 2006 International Conference on Machine Learning and Cybernetics $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Proceedings of 2006 International Conference on Machine Learning and Cybernetics