The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The increasing size and availability of web data make data quality a core challenge in many applications. Principles of data quality are recognized as essential to ensure that data fit for their intended use in operations, decision-making, and planning. However, with the rise of the Semantic Web, new data quality issues appear and require deeper consideration. In this paper, we propose to extend the...
We present an initial utility study of a distributionalmodel of verb selectional preferences for 3rd personpronoun resolution in German. We investigate cases in which3rd person pronouns occur as subjects of transitive verbs. Ineach such case, the likelihood of inserting one of the antecedentcandidates is calculated as the conditional probability of theantecedent candidate given either the verb governing...
In this paper we explore the written dialog behavior of participants in anon line discussion for automatic identification of participants who pursue power within the discussion group. We employ various standard unsupervised machine learning approaches to make this prediction. Our approach relies on the identification of certain discourse structures and linguistic techniques used by participants in...
This work investigates identifying social behaviors (adversarial behavior and influence) of participants in online discussion forums from how their language use in English, Arabic, and Chinese. We describe the challenges of annotating implicit information signaled by subtle queues and present two styles of annotation -- one using professional annotators and the other with Mechanical Turk. Our system,...
In this paper, we present a Vietnamese Natural Language Inter face to a survey dat abase for individuals and businesses who want to know economic information from economic surveys. We carry out analysis of various Vietnamese question types and investigate the st ability of our approach using GATE framework and R language. Our system uses R language to specifically deal with statistical question types...
In this paper, we investigate whether the social goals of an individual's utterances can be recognized through analysis of a discourse's intentional structure. Specifically we focus on identifying individuals pursuing power within a group. Individuals pursue power in order to increase their control of the actions and goals of the group. Following work in discourse processing we decompose the problem...
In the context of semantic knowledge bases, among the possible problems that may be tackled by means of data-driven inductive strategies, one can consider those that require the prediction of the unknown values of existing numeric features or the definition of new features to be derived from the data model. These problems can be cast as regression problems so that suitable solutions can be devised...
We define a generic model for finite audio or symbolic musical patterns that structurally encode a rich and abstract synchronization mechanism. This is achieved by distinguishing for each pattern a realization window, describing what the pattern is, from a synchronization window, describing how the pattern can be used. The sequential composition of patterns is defined and studied. An algebra of musical...
Data-oriented applications have experienced a huge growth mainly in distributed settings. The increasing amount of available data has made it hard for users to find the information they need in the way they consider relevant. To help matters, a user-centric approach may be used to enhance query answering and, particularly, provide query personalization. In this work, we address the issue of personalizing...
Query expansion is a crucial step in recall-oriented domains such as Patent Searching. Currently, automatic query expansion in patent search is mostly based on statistical measures. Additional query terms are extracted from the query documents based on entropy measures. To automate query expansion in patent searching, we acquire lexical knowledge from Query Logs of USPTO Patent Examiners. Results...
This paper presents an algorithm that generalizes big sets of contextual situations. Apart from giving details on its mechanisms and implementation, we discuss its employment in a context experience sharing system, KRAMER, and simulate its performance in function of several parameters modelling the expected real data experiment.
Statistical data is one of the most important sources of information, relevant for large numbers of stakeholders in the governmental, scientific and business domains alike. In this article, we overview how statistical data can be managed on the Web. With OLAP2 Data Cube and CSV2 Data Cube we present two complementary approaches on how to extract and publish statistical data. We also discuss the linking,...
Feature sparseness is one of the main causes for Word Sense Disambiguation (WSD) systems to fail, as it increases the probability of incorrect predictions. In this work, we present a WSD method to overcome this problem by using an automatically-created thesaurus to append related words to a specific context, in order to improve the effectiveness of candidate selection for an ambiguous word. We treat...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.