The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The Machine Learning (ML) field has gained its momentum in almost any domain of research and just recently has become a reliable tool in the medical domain. The empirical domain of automatic learning is used in tasks such as medical decision support, medical imaging, protein-protein interaction, extraction of medical knowledge, and for overall patient management care. ML is envisioned as a tool by...
In this work, an unsupervised statistical method for automatic correction of preposition errors using the Google n-gram data set is presented and compared to the state-of-the-art. We use the Google n-gram data set in a back-off fashion that increases the performance of the method. The method works automatically, does not require any human-annotated knowledge resources (e.g., ontologies) and can be...
In this article, we present a novel statistical representation method for knowledge extraction from a corpus containing short texts. Then we introduce the contrast parameter which could be adjusted for targeting different conceptual levels in text mining and knowledge extraction. The method is based on second order co-occurrence vectors whose efficiency for representing meaning has been established...
This paper describes how the Google Web 1T 5-gram data set, contributed by Google Inc., can be stored so that it can be used efficiently with respect to time. We present an efficient way of accessing all the 5-grams for a specific word of interest from the stored files. We measure the maximum access and processing efficiency achievable for any word of interest. We also compare results (access time...
Interactive simulation games used for training usually require a large amount of coherent narrative content. An effective and efficient solution to the narrative content creation problem is to use Natural Language Generation (NLG) systems. The use of NLG systems, however, requires sophisticated linguistic and sometimes programming knowledge. For this reason, NLG systems are typically not accessible...
In this paper we explore the task of mood classification for blog postings. We propose a novel approach that uses the hierarchy of possible moods to achieve better results than a standard machine learning approach. We also show that using sentiment orientation features improves the performance of classification. We used the Livejournal blog corpus as a dataset to train and evaluate our method.
We present a method for correcting real-word spelling errors using the Google Web 1T n-gram data set and a normalized and modified version of the longest common subsequence (LCS) string matching algorithm. Our method is focused mainly on how to improve the correction recall (the fraction of errors corrected) while keeping the correction precision (the fraction of suggestions that are correct) as high...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.