The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
State-of-the-art phrase-based machine translation (MT) systems usually demand large parallel corpora in the step of training. The quality and the quantity of the training data exert a direct influence on the performance of such translation systems. The lack of open-source bilingual corpora for a particular language pair results in lower translation scores reported for such a language pair. This is...
This paper presents a hybrid model which combines conditional random fields (CRFs) with dynamic gazetteers (DGs) for the task of Chinese named entity recognition (NER). In the previous work of NER, gazetteers were widely used. But their gazetteers were all static ones which cannot adapt themselves to the new domains and new out-of-vocabulary named entities (OOVNEs). In this work, we build and maintain...
Many automatic word alignment techniques have been so far developed in Natural Language Processing (NLP). However, word alignment between English and Hindi has not progressed much due to two main reasons viz. complex structure of the participating languages and the scarcity of Hindi-language resources. This paper provides a corpus-augmented method of word alignment in which these limitations have...
This paper describes a hybrid system that applies maximum entropy (MaxEnt) model with hidden Markov model (HMM) and some linguistic rules to recognize name entities in Oriya language. The main advantage of our system is, we are using both HMM and MaxEnt model successively with some manually developed linguistic rules. First we are using MaxEnt to identify name entities in Oria corpus, then tagging...
Unknown word recognition is a very important problem in natural language processing. It has a great influence on the performance of dictionary construction and word segmentation. This paper introduces two methods to improve the effect of Chinese unknown word recognition by using Conditional Random Fields: the rough label of the characters and the N-best listing. The CRF with the two methods proposed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.