The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The following topics are dealt with:language lexicon , morphology, syntax and parsing; information extraction; text understanding and summarization; machine translation; language resources; semantics; and spoken language processing.
This paper discusses the rendering issues of complex text layout - traditional Mongolian script. The traditional Mongolian script has been standardized in Unicode. We analyzed existing Open Type fonts and their rendering schemes for traditional Mongolian script. We found some errors, and discovered grammatical rules, which are not documented in international standards. None of the existing Open Type...
Mongolian lexical analysis is the first step in Mongolian information processing such as Chinese-Mongolian machine translation. In this paper, we introduce a statistic and rule based approach to solving the Mongolian word segmentation & POS tagging all at once. In this method, we use tree frame as basic statistical model. And then we combine the model with some rules to improve the lexical analysis...
In this paper, we present a letter tagging approach(LTA) to Uyghur tokenization. Experiments show that the problem with label bias (rich and complex suffixes) problem to be resolved using LTA combined with CRFs, so it is more effective than previous work, the accuracy of word tokenization reaches 93.3%. In future our tokenization research will be very useful to other Altaic languages information processing.
This paper describes a method for the development of Bangla Enconversion within the framework of the Universal Networking Language (UNL). We also discuss some issues and problems related to the UNL representation that affect the quality of generation. Additionally, the ling ware engineering is introduced as a technique to enhance the quality and increase the development efficiency. In this paper a...
Nouns and verbs pose the major challenge in part-of-speech tagging exercises. In this paper we present a suffix based noun and verb classifier for Assamese, an inflectional, relatively free word order Indic language. We used a tiny dictionary of frequent words to increase the accuracy. We obtained F-score of around 85%.
This paper discusses the behavior of `kaa' and suggests the selection of Part of Speech (POS) on the basis of linguistic evidence. It also suggests some tests that can be used for correct classification of `kaa'. The selection of correct POS is important for computational processing, including parsing, generation, and identification of grammatical relations.
Based on necessity of the establishment of modern Uygur morphemes database, the paper studies the principle and the method to define Uygur morphemes and focuses on some special conditions including syllabic of morphemes, dual-part-words, morpheme cluster and compound morphemes. It is a basic study of the establishment of Uygur morphemes database.
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. Morphological...
Based on the general syllable structure, a syllable's component letters should be expanded orderly into the series of basic consonant, prefix consonant, head consonant... and the second suffix consonant. If there is no letter in a syllable's particular position, a special character, whose collation element is less than that of any Tibetan letter, should be used in the corresponding position of the...
The Universal Networking Language (UNL) is a world wide generalizes form of human interactive language in a machine independent digital platform for defining, recapitulating, amending, storing and dissipating knowledge or information among people of different affiliations. The theoretical and applied research associated with this interdisciplinary endeavor facilitates in a number of practical applications...
The high-order graph-based dependency parsing model achieves state-of-the-art accuracy by incorporating rich feature representations. However, its parsing efficiency and accuracy degrades dramatically when the input sentence gets longer. This paper presents a novel two-stage method to improve high-order graph-based parsing, which uses punctuation, such as commas and semicolons, to segment the input...
In this paper, we applied VB-EM algorithm to generate a probability of constituent combination for PGLR parser. Three linguistic features which are simple PCFG, head-outward dependency and head-emission were calculated. The probabilities were used in a parsing process to find the best probable output tree. From our experiment, the parsing result from a combination of all features for first path and...
In this paper, we propose a simple approach to use verb sub categorization-based pattern matching method to rerank the output of a baseline parsing system. A baseline parser first provides a set of n-best candidate parsing trees. Then we extract various features of verb sub categorization from train corpora. And use those features of verb sub categorization extracted from train corpus to rerank the...
The paper proposes an identification method of Maximal-Length Noun Phrase (MNP) based on Maximal-Length Preposition Phrase (MPP). We identify MNP utilizing the mutual restricting characteristic of MNP and adverbial MPP. We employ Conditional Random Fields (CRFs) model in identification processing, and use new tags and above long-distance word as features. Experimental result shows a high quality performance...
In this work, chunking is used to mark the noun phrases of Urdu sentences. The approach used in this work is hybrid that combines statistical method and hand crafted rules. The statistical model used in this work is HMM along with IOB chunk annotation. From a POS tagged corpus of 100,000 words, around 90,000 word tokens are used for training and 10,000 word tokens for testing. Several experiments...
The lexical language model is recently the hotspot in grammar research, which is promoted by incorporating the phrase head with statistics. This paper summarizes about four improving language models which belong to this kind of model: they have utilized heads that is extracted by CFG and calculated the probability between the heads or inside CFG. Different from N-gram and SCFG, the probability calculation...
Finding semantic similarity is an important task in many natural language processing applications. Despite numerous works for popular languages, there is still limited research done for Vietnamese. In this paper, we tackle the problem of finding semantic similarity for Vietnamese using Random Indexing and Hyperspace Analogue to Language to represent the semantics of words and documents. We build a...
In the research and development of various natural language processing systems, like Q&A system and text-to-scene conversation system, we realize that knowledge of text entailment helps a lot in improving the performance of the system. Systems with text entailment knowledge will be smarter than those who without entailment knowledge. Currently many research teams are focusing on text entailment,...
The recognition of the semantic orientation of the adverb on the computer is a new temptation to discuss sentence processing starting from semantic. In this paper, in order to reach computer automatic identification of the adverb “Jiù”, the rules and principles of the semantic orientation of this type are summarized and proposed respectively according to its sentence structure. On the basis of these,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.