The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We argue that verbose queries used for software retrieval contain many terms that follow specific discourse rules, yet hinder retrieval. We report the results of an empirical study on the effect of removing such terms from verbose queries in the context of Text Retrieval-based concept location. In the study, we remove terms from 424 queries, generated from bug reports of nine open source systems....
Measuring "similarity" has been established as afundamental problem and has been widely studied. In thispaper we propose a novel approach for establishing similarityin context of citation network. With the rapidly growing sizeof academic literature, the problem of finding similar researchpapers has become a challenging task. Research papers in acitation network often form communities based...
The evolution of information retrieval is intimately linked to the evolution of the Web. Although the involvement of different context dimensions in the improvement of the search task was greatly studied, the abundant development of hardware and software opens ways to explore new contextual dimensions. This paper presents the different contextual dimensions studies in the literature and proposes a...
Low information quality is one of the reasons why information extraction initiatives fail. Incomplete information has a pervasive negative impact on downstream processing steps. This work addresses this problem with a novel information extraction approach, which integrates data mining and information extraction methods into a single complementary approach in order to benefit from their respective...
Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy...
Although static ranked lists remain the dominant Web search interface, they can limit the ability of Web searchers to find desired information when it is buried deep in the collection of search results. Web search visualization and Web search personalization are two active research directions that have shown promise for improving the user experience while searching the Web. In this paper, we propose...
Ontologies is playing an increasingly important role in knowledge management and the Semantic Web. The tourism information ontology is becoming a core research field in the realm of information retrieval. An ontology construction method based on Formal Concept Analysis (FCA) to extract domain ontology from unstructured text documents is proposed. Under the framework of our ontology construction method,...
A Max-Probability Density based Clustering (MPDC) algorithm is proposed in this paper to resolve the problem of Word Sense Disambiguation in semantic document. MPDC take the context information of a keyword based on WordNet into account and select the max probability sense by measuring the density of the concept. We also do experiment on semantic documents retrieving from Swoogle and Watson, two famous...
Document re-ranking is a middle module in information retrieval system. It's expected that more relevant documents with query appear in higher rankings, from which automatic query expansion can benefit, and it aims at improving the performance of the entire information retrieval. In this paper, we construct a pseudo labeled document based on pseudo-relevance feedback principle, and discuss about the...
Word Sense Disambiguation (WSD) is the task of selecting the meaning of a word based on the context in which the word occurs. The principal statistical WSD approaches are supervised and unsupervised learning. The Lesk method is an example of unsupervised disambiguation. We present a measure for sense assignment useful for the simple Lesk algorithm. We use word co-occurrences of the gloss and the context,...
Both of XML document and user's query are represented by the set of paths from the root node to leaf nodes. So the context and content information contained in the corresponding path is a vital important clue to research XML retrieval. This paper presents an approach, NPathSim, for measuring similarity between two paths. XML Path retrieval was performed to evaluate the performance of NPathSim. The...
Query expansion is a widely studied technique for improving information retrieval effectiveness. In this paper we proposed a new query expansion technique using the comprehensive thesaurus WordNet and its semantic relatedness measure modules. Word sense disambiguation are performed on original query sentence, yielding the concept of each term in the query. Based on those recovered concepts, expanded...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.