The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Comparable corpora contain significant quantities of useful data for Natural Language Processing tasks, especially in the area of Machine Translation. They are mainly the source of parallel text fragments. This paper investigates how to effectively extract bilingual texts from comparable corpora relying on a small-size parallel training corpus. We propose a new technique to filter non parallel articles...
AttitudeBuzz is a system that analyzes and presents complex social attitudes based on geolocated social media data. The system uses a machine learning model to apply highly domain-specific sentiment analysis to such data, specifically Twitter, by learning modulators around a configurable lexicon central to the domain of inquiry. Training data are acquired from geographical areas where a specific attitude...
Bug reporting is essentially an uncoordinated process. The same bugs could be repeatedly reported because users or testers are unaware of previously reported bugs. As a result, extra time could be spent on bug triaging and fixing. In order to reduce redundant effort, it is important to provide bug reporters with the ability to search for previously reported bugs. The search functions provided by the...
Depending on questions, various answering methods and answer sources can be used. In this paper, we build a distributed QA system to handle different types of questions and web sources. When a user question is entered, the broker distributes the question over multiple sub-QAs according to question types. The selected sub-QAs find local optimal candidate answers, and then they are collected in to the...
Existing Automatic Image Annotation (AIA) systems are typically developed, trained and tested using high quality, manually labelled images. The tremendous manual efforts required with an untested ability to scale and tolerate noise all have an impact on existing systems' applicability to real-world data. In this paper, we propose a novel AIA system which harnesses the collective intelligence on the...
Recognition of named entities (people, companies, locations, etc) is an essential task of text analytics. We address the subproblem of this task, namely, named entity classification. We propose a novel approach that constructs an effective fine-grained named entity classifier. Its key highlights are semi-automatic training set construction from Wikipedia articles and additional feature selection....
This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high...
We describe a method to retrieve images found on Web pages with specified object class labels, using an analysis of text around the image and of image appearance. Our method determines whether an object is both described in text and appears in a image using a discriminative image model and a generative text model. Our models are learnt by exploiting established online knowledge resources (Wikipedia...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.