The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The subtree kernel and the information tree kernel proposed here measure the syntactic similarity of sentences. For two syntactic trees, these kernels are defined, respectively, as the total number of common subtrees in the syntactic trees and the total information content contained in their common subtrees, where the information content of a common subtree is calculated using its probability. Analyses...
We have investigated the effect of normalizing Japanese orthographical variants into a uniform orthography on statistical machine translation (SMT) between Japanese and English. In Japanese, 10% of words have reportedly more than one orthographical variants, which is a promising fact for improving translation quality when we normalize these orthographical variants. However, the results show that SMT...
This study describes the construction of the TOCFL (Test Of Chinese as a Foreign Language) learner corpus, including the collection and grammatical error annotation of 2,837 essays written by Chinese language learners originating from a total of 46 different mother-tongue languages. We propose hierarchical tagging sets to manually annotate grammatical errors, resulting in 33,835 inappropriate usages...
In this paper, we investigate a range of strategies for combining multiple machine learning techniques for recognizing Arabic characters, where we are faced with imperfect and dimensionally variable input characters. Experimental results show that combined confidence-based backoff strategies can produce more accurate results than each technique produces by itself and even the ones exhibited by the...
In this paper we described an authorship attribution system for Bengali blog texts. We have presented a new Bengali blog corpus of 3000 passages written by three authors. Our study proposes a text classification system, based on lexical features such as character bigrams and trigrams, word n-grams (n = 1, 2, 3) and stop words, using four classifiers. We achieve best results (more than 99%) on the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.