The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Given a set of keywords, we find a maximum Web query (containing the most keywords possible) that respects user-defined bounds on the number of returned hits. We assume a real-world setting where the user is not given direct access to a Web search engine's index, i.e., querying is possible only through an interface
Language Model (LM) constitutes one of the key components in Keyword Spotting (KWS). The rapid development of the World Wide Web (WWW) makes it an extremely large and valuable data source for LM training, but it is not optimal to use the raw transcripts from WWW due to the mismatch of content between the web corpus
system called "WebAngels filter" which uses textual and structural content-based analysis. These analysis are based on a violent keyword dictionary. We focus our attention on the keyword dictionary preparation, and we demonstrate that a semi-automatic keyword dictionary can be used to improve the filtering efficiency of
Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering
This paper presents a keyword extraction technique that can be used for tracking topics over time. In our work, keywords are a set of significant words in an article that gives high-level description of its contents to readers. Identifying keywords from a large amount of on-line news data is very useful in that it can
easy to bring the problem of topic excursion. Hits algorithm requires a number of pages as the basic-set for calculating and cannot be used in plain texts. This paper introduces a new algorithm: PK-TDC which makes use of the iterative idea of Hits. PK-TDC searches the authority pages and keywords on the topology of pages
Text keywords at different semantic levels have different semantic representation abilities. Although words have been organized by semantic dictionaries (e.g. WordNet) with exact semantics, the dictionaries can not be constructed automatically by machine and there are still many words which are not included in the
In order for researchers in scientific and technological fields to find more proper information resources on Web, an auxiliary search structure is proposed, which is a class hierarchy of documents built based on the keywords of the documents. To cover the contents of the document properly, the keywords are extracted
In this paper we focus on building a large scale keyword search service over structured peer-to-peer (P2P) networks. Current state-of-the-art keyword search approaches for structured P2P systems are based on inverted list intersection. However, the biggest challenge in those approaches is that when the indices are
In this work, we compare various text-based pornographic Web filtering techniques. The techniques include blacklist and keyword blocking. The technique called SV is modified to extract a representative feature vector. Each test Web pagepsilas feature is extracted and gathered as a vector. The vector is then summarized
. First, the related textual information associated with Web images is identified as the candidate annotations for Web images. Second, the word co-occurrence is utilized to eliminate irrelevant keywords for improving the annotation accuracy. Then, the keyword-based association analysis is exploited to further discover
Keyword-based search is one of the most important technologies of text search. But with the development of World Wide Web, it is not enough only relying on matching the form of keywords. This paper introduced a semantic parsing model constructed above a symbolic system of concepts for understanding natural language
will be able to identify concepts and relationships from the dataset based on keyword searches in their own workspace and collaborate visually with other analysts using visualization tools such as a concept map view and a timeline view. The system allows analysts to parallelize the work by dividing initial sets of
FCA, a session interest concept is defined as a pair of extent and intent where the extent covers a set of documents selected by the user among the search results and the intent covers a set of keyword features extracted from the selected documents. And, in order to make a concept network grow, we need to calculate the
Lack of overall ecological knowledge structure is a critical reason for learners' failure in keyword-based search. To address this issue, this paper firstly presents the dynamic location-aware and semantic hierarchy (DLASH) designed for the learners to browse images, which aims to identify learners' current
results in up to 1.1% absolute Word Error Rate (WER) improvement as compared to keyword-based approaches. The proposed approach reduces the WER by 6.3% absolute in our experiments, compared to an in-domain LM without considering any Web data.
semantic net which can be applied to build personalized search engine and tested with single query keyword and multi ones by three different calculating policies. The test results show that it can affect the sort of pages. The personalized search based on vocabulary semantic net improves the quality of search results greatly.
designed and implemented to resolve the problem of crossing language queries and retrieving images processes. It can greatly reduce lot of time and effort for the search. The experiments on diverse queries on Yahoo images search have shown that the proposed scheme can improve the images results for non-English keyword
provide simple message analysis features such as browsing and simple keyword-based searching of the recorded messages. In this paper, we propose a system, called IMAnalysis, that supports intelligent chat message analysis using text mining techniques. The IMAnalysis system provides functions on chat message retrieval, social
to keyword searching. Thus far, the identification of the facets was either a manual procedure, or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper, we present an unsupervised technique for automatic extraction of facets useful for browsing text databases
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.