The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recently, a growing number of researches have focused on the issues raised by the knowledge discovery of online information, particularly the problems of tracking topics, ideas, and users' spreading influence across the Web. In this paper, the search-engine query logs on Topic Detection and Tracking (TDT) is analyzed other than study of the quality of the search result or query recommendation. By...
Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms...
Static index pruning techniques aim at removing from the posting lists of an inverted file the references to documents which are likely to be not relevant for answering user queries. The reduction in the size of the index results in a better exploitation of memory hierarchies and faster query processing. On the other hand, pruning may affect the precision of the information retrieval system, since...
User-generated reviews play an important role for potential consumers in making purchase decisions. However, the quality and helpfulness of user-generated reviews are unavailable unless consumers read through them. Automatically predicting the helpfulness of user-generated reviews can assist consumers in discovering helpful reviews. Existing helpfulness assessing models make use of the positive vote...
The current search engine model considers users not trustworthy, so no tools are provided to let them specify what they are looking for or in what context, which severely limits what they are able to achieve. Instead, search engines try to guess that, which is currently done using "implicit feedback''. In this paper we propose a "web exploration engine'' - a model where users can use the...
As there are more and more online sources available on the Web, it becomes very time-consuming, if not impossible, to visit and search all web sites, one by one. Many search engines has been developed to help users find information of their need. However, search engines work poor for online sources whose data are often in deep web, which is not part of surface web indexed by standard search engines...
In this paper, we propose a Relation Expansion framework, which uses a few seed sentences marked up with two entities to expand a set of sentences containing target relations. During the expansion process, label propagation algorithm is used to select the most confident entity pairs and context patterns. The label propagation algorithm is a graph based semi-supervised learning method which models...
Recent years have seen a huge increase in the amount of publicly-available information relevant to drug discovery, including online databases of compound and bioassay information; scholarly publications linking compounds with genes, targets and diseases; and predictive models that can suggest new links between compounds, genes, targets and diseases. However, there is a lack of tools and methods to...
This paper intends to present a straightforward, extensive, and noise resistant method for efficiently tagging a web query, submitted to a search engine, with proper category labels. These labels are intended to represent the closest categories related to the query which can ultimately be used to enhance the results of any typical search engine by either restricting the results to matching categories...
Latent relational search is a new search paradigm based on the degree of analogy between two word pairs. A latent relational search engine is expected to return the word Paris as an answer to the question mark (?) in the query {(Japan, Tokyo), (France, ?)} because the relation between Japan and Tokyo is highly similar to that between France and Paris. We propose an approach for exploring and indexing...
Designing personalized search engines based on a recommender system that takes into consideration the user situated moment in relation to the subject matter and the context that governs user interest has been largely ignored. In this paper, we present a novel approach to integrating user interests into search within a recommender system that is guided by the semantic representation of the user and...
Like search engines, recommender systems have become a tool that cannot be ignored by websites with a large selection of products, music, news or simply webpages links. The performance of this kind of system depends on a large amount of information. At the same time, the amount of information on the Web is continuously growing, especially due to increased User Generated Content since the apparition...
Web cache replacement Algorithms proposed in the literature try to maximize the Hit Ratio (HR), the Byte Hit Ratio (BHR), and the Delay Saving Ratio (DSR). However, even with an infinite Web cache storage capacity, values of these metrics could not exceed 70% most of the time. This is due to the fact that, given a workload, the first reference to an object is always a miss. Moreover, a statistical...
The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and...
Our recent study discovers that humans are more sensitive to the semantic difference caused by categorization than specification. Based on this new discovery, this paper proposes a novel weighted edge approach embedding the specification levels of both words and their Least Common Ancestor (LCA) into a weighted graph distance by exponentially decreasing the weight along its specification level. Experimental...
In this paper we discuss the collection, semantic annotation and analysis of real-time social signals from micro blogging data. We focus on users interested in analyzing social signals collectively for sense making. Our proposal enables flexibility in selecting subsets for analysis, alleviating information overload. We define an architecture that is based on state-of-the-art Semantic Web technologies...
The Chem2Bio2RDF portal is a Linked Open Data (LOD) portal for systems chemical biology aiming for facilitating drug discovery. It converts around 25 different datasets on genes, compounds, drugs, pathways, side effects, diseases, and MEDLINE/PubMed documents into RDF triples and links them to other LOD bubbles, such as Bio2RDF, LODD and DBPedia. The portal is based on D2R server and provides a SPARQL...
Fuzzy ontologies are deemed as useful formalisms for dealing with vagueness in the Semantic Web community. Description logics (DLs) are the logical foundations of standard web ontology languages. Conjunctive queries are deemed as an expressive reasoning service for DLs. DL reasoners can be enriched by a conjunctive query service. In this study, we focus on fuzzy (threshold) conjunctive queries over...
In W3C's Rule Interchange Format (RIF), F-Logic rules have received considerable attention as a major logical rule formalism, while combinations of rules with Description Logic (DL) ontologies in RIF, let alone with F-Logic rules, are far less developed. To mend this, we first present F-Logic# knowledge bases, a framework based on the semantics of the well-investigated dl-programs, that provides a...
Most of today's business processes are complex and consist of more than one party or single step procedures. In the Web, this is reflected by the existence of billions of Web sites, which may be regarded as complex processes, and on the other side only a few thousands of publicly available WSDL files that present single services. The availability of semantic descriptions of services and processes...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.