The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The following topics are dealt with:language lexicon , morphology, syntax and parsing; information extraction; text understanding and summarization; machine translation; language resources; semantics; and spoken language processing.
Summary form only given. In this talk, the speaker will measure the reduction in ambiguity that can be gained by using translated text to constrain meanings. Instead of using the translation itself to determine senses, they use a shared hierarchy of word senses: WordNet. Experiments with aligned Chinese, English and Japanese text show a substantial reduction in ambiguity for each language.
This paper discusses the rendering issues of complex text layout - traditional Mongolian script. The traditional Mongolian script has been standardized in Unicode. We analyzed existing Open Type fonts and their rendering schemes for traditional Mongolian script. We found some errors, and discovered grammatical rules, which are not documented in international standards. None of the existing Open Type...
Mongolian lexical analysis is the first step in Mongolian information processing such as Chinese-Mongolian machine translation. In this paper, we introduce a statistic and rule based approach to solving the Mongolian word segmentation & POS tagging all at once. In this method, we use tree frame as basic statistical model. And then we combine the model with some rules to improve the lexical analysis...
In this paper, we present a letter tagging approach(LTA) to Uyghur tokenization. Experiments show that the problem with label bias (rich and complex suffixes) problem to be resolved using LTA combined with CRFs, so it is more effective than previous work, the accuracy of word tokenization reaches 93.3%. In future our tokenization research will be very useful to other Altaic languages information processing.
This paper describes a method for the development of Bangla Enconversion within the framework of the Universal Networking Language (UNL). We also discuss some issues and problems related to the UNL representation that affect the quality of generation. Additionally, the ling ware engineering is introduced as a technique to enhance the quality and increase the development efficiency. In this paper a...
Nouns and verbs pose the major challenge in part-of-speech tagging exercises. In this paper we present a suffix based noun and verb classifier for Assamese, an inflectional, relatively free word order Indic language. We used a tiny dictionary of frequent words to increase the accuracy. We obtained F-score of around 85%.
This paper discusses the behavior of `kaa' and suggests the selection of Part of Speech (POS) on the basis of linguistic evidence. It also suggests some tests that can be used for correct classification of `kaa'. The selection of correct POS is important for computational processing, including parsing, generation, and identification of grammatical relations.
Based on necessity of the establishment of modern Uygur morphemes database, the paper studies the principle and the method to define Uygur morphemes and focuses on some special conditions including syllabic of morphemes, dual-part-words, morpheme cluster and compound morphemes. It is a basic study of the establishment of Uygur morphemes database.
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. Morphological...
Based on the general syllable structure, a syllable's component letters should be expanded orderly into the series of basic consonant, prefix consonant, head consonant... and the second suffix consonant. If there is no letter in a syllable's particular position, a special character, whose collation element is less than that of any Tibetan letter, should be used in the corresponding position of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.