The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Morphological analysis is an essential step for processing the Korean language, due to highly agglutinative properties of the language. In this paper, we propose a novel approach for constructing a Korean morphological analyzer that can capture linguistic properties using graphemes as basic processing units. Since our model does not utilize prior linguistic knowledge, the model can be applied to other...
This paper presents a novel version of ExATO, a term extractor originally designed to extract relevant terms from corpora in Portuguese. In this new version not only corpora in Portuguese can be handled, but also texts in English are accepted. This extension is likely to offer the same quality pattern already achieved for Portuguese. In this paper, we draw the analysis of results in parallel corpora...
Political debates about a reform may sparkle national controversies, by leading members of the community to polarize their opinions and sentiment about the topic addressed. With the rise of social media like Twitter users are encouraged to voice and share their strong and polarized views and in general people are exposed to broader viewpoints than they were before. The large amount of user-generated...
In this paper, we describe an approach for extracting named entities from Arabic texts. Arabic language is hard to process since its characteristics that influence, even, the NE extraction. For our case, we consider that the named entities extraction can be assimilated to a typical classification problem. Indeed, this extraction consists of searching for text portions that can be classified in a NE...
Assigning the appropriate grammatical category to a word given a context is very important step in major areas of natural language processing. A limited numbers of Part of Speech Taggers currently exist for Arabic. These taggers mainly adopt tagsets that were developed for languages such as English. In this paper we present an effort of proposing a revised categories for Arabic POS tags that would...
We present here a data mining approach for part-of-speech (POS) tagging, an important Natural language processing (NLP) classification task. We propose a semi-supervised associative classification method for POS tagging. Existing methods for building POS taggers require extensive domain and linguistic knowledge and resources. Our method uses a combination of a small POS tagged corpus and untagged...
This paper describes the development of parser algorithm which is used for Hindi-English machine translation (MT). Machine translation requires analysis, transfer and generation steps to produce target language output from a source language input. Structural representation of Hindi sentences codes the information of Hindi sentences and a transfer module can be designed to generate English sentences...
Combinational ambiguity is a challenging issue in Chinese word segmentation in that its disambiguation depends on the contextual information. This paper collects contextual information of 28 typical combinational ambiguity strings, and makes use of lexical, syntactic and semantic knowledge and large scale corpus to summarize the rules of these combinational ambiguity strings. Using these rules to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.