The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The following topics are dealt with:language lexicon , morphology, syntax and parsing; information extraction; text understanding and summarization; machine translation; language resources; semantics; and spoken language processing.
This paper describes a method for the development of Bangla Enconversion within the framework of the Universal Networking Language (UNL). We also discuss some issues and problems related to the UNL representation that affect the quality of generation. Additionally, the ling ware engineering is introduced as a technique to enhance the quality and increase the development efficiency. In this paper a...
Nouns and verbs pose the major challenge in part-of-speech tagging exercises. In this paper we present a suffix based noun and verb classifier for Assamese, an inflectional, relatively free word order Indic language. We used a tiny dictionary of frequent words to increase the accuracy. We obtained F-score of around 85%.
In this paper, we propose a simple approach to use verb sub categorization-based pattern matching method to rerank the output of a baseline parsing system. A baseline parser first provides a set of n-best candidate parsing trees. Then we extract various features of verb sub categorization from train corpora. And use those features of verb sub categorization extracted from train corpus to rerank the...
The expression of lexical semantics is the crucial factor for natural semantic processing. This paper proposes a new theoretical model for constructing a lexical semantic knowledge base. According to this theory, semantic genes are the carriers of lexical meanings. They can be inherited from hypernyms to hyponyms, and during the inheritance they may be mutated. By heredity, recombination and variation...
Noun phrase understanding is very important for many sub-fields of natural language processing and information retrieval. This paper proposed a classification framework for Chinese post-modified V+N phrases. The basic idea is that most noun phrases might be mapped to corresponding clauses. Therefore, case, time, aspect and modality can also be encoded in noun phrases as in verb phrases. All those...
We present a work on identification and classification of Named Entities in Tamil, a morphologically rich language. Here we have used a machine learning technique, Conditional Random Fields (CRFs) for this task. We discuss here about the linguistic features used in CRFs for this morphologically rich language.
Anaphora resolution (AR) is a process to identify the appropriate antecedent with its anaphor which occur before the anaphor. AR able to improve most of the NLP applications such as question answering, short answer examination system and information extraction. Most of AR systems are deal with English language. Thus, in 1990's the research on AR has been applied for other language, such as Arabic,...
This paper describes an approach to Vietnamese text summarization, concentrated on the discourse structure of the text. Based on characteristics of Vietnamese, we propose rules for segmenting text into elementary discourse units (edus) and for recognizing discourse relations between textual spans. The score of an edu is computed based on the discourse tree. The edus with highest scores are chosen...
Inflection and derivation have been the main ways of creating new words in English. With the development of science and technology, words as such appear faster than ever in scientific literature. Influenced by English, Chinese words with multiple affixes are also becoming a major way of new word creation in scientific literature. By studying the similarities of their original sources, this paper employs...
Many automatic word alignment techniques have been so far developed in Natural Language Processing (NLP). However, word alignment between English and Hindi has not progressed much due to two main reasons viz. complex structure of the participating languages and the scarcity of Hindi-language resources. This paper provides a corpus-augmented method of word alignment in which these limitations have...
This paper presents an approach to build synsets for Indonesian Word Net semi-automatically using monolingual lexical resources available freely in Bahasa Indonesia. Monolingual lexical resources refer to Kamus Besar Bahasa Indoensia or KBBI (monolingual dictionary of Bahasa Indonesia) and Tesaurus Bahasa Indonesia (Indonesian thesaurus). We assume that monolingual resources will play an important...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.