The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Language Model (LM) constitutes one of the key components in Keyword Spotting (KWS). The rapid development of the World Wide Web (WWW) makes it an extremely large and valuable data source for LM training, but it is not optimal to use the raw transcripts from WWW due to the mismatch of content between the web corpus
system called "WebAngels filter" which uses textual and structural content-based analysis. These analysis are based on a violent keyword dictionary. We focus our attention on the keyword dictionary preparation, and we demonstrate that a semi-automatic keyword dictionary can be used to improve the filtering efficiency of
Query-recommendation systems based on inputted queries have become widespread. These services are effective if users cannot input relevant queries. However, the conventional systems do not take into consideration the relevance between recommended queries. This paper proposes a method of obtaining related queries and clustering them by using the history of query frequencies in query logs. We define...
In this work, we compare various text-based pornographic Web filtering techniques. The techniques include blacklist and keyword blocking. The technique called SV is modified to extract a representative feature vector. Each test Web pagepsilas feature is extracted and gathered as a vector. The vector is then summarized
designed and implemented to resolve the problem of crossing language queries and retrieving images processes. It can greatly reduce lot of time and effort for the search. The experiments on diverse queries on Yahoo images search have shown that the proposed scheme can improve the images results for non-English keyword
to keyword searching. Thus far, the identification of the facets was either a manual procedure, or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper, we present an unsupervised technique for automatic extraction of facets useful for browsing text databases
important words or phrases in the text to other pages, thereby letting users quickly access additional information. An automatic text-annotation system combines keyword extraction and word-sense disambiguation to identify relevant links to Wikipedia pages.
performance. Apart from estimating the best path to follow, our system also expands its initial keywords by using genetic algorithm during the crawling process. To crawl Vietnamese web pages, we apply a hybrid word segmentation approach which consists of combining automata and part of speech tagging techniques for the Vietnamese
their historical and social context by understanding how the major topics associated with them have changed over time. Users can relate articles through time by examining the topical keywords that summarize a specific news event. By tracking the attention to a news article in the form of references in social media (such as
videos, we can only use a title. If there are tags - significant keywords of that multimedia, we can use tag information to search. Tag is a keyword of text, blog post, or multimedia. Users have already recognized about the value and importance of tags but only a few users are using tags. They might be annoying to add tags
events. And a huge resource of text-based emotion can be found from the World Wide Web nowadays. This paper reports a study to investigate the effectiveness of using SVM (Support Vector Machine) on linguistic features considering emotion keywords and negative words, and classify a collection of blog posts sentences tagged
This paper describes a new approach of enhancing textual document search and retrieval. The approach tries to take advantage of structured query languages in search and retrieval. For this purpose the semantic model of the document is created. The semantic model of the document is an ontology-like structured semantic annotation of the document that can support structured querying. This paper discusses...
This paper introduces a method of constructing a semantic dictionary automatically from the keywords and classify relations of the web encyclopedia Chinese WikiPedia. Semantic units, which are affixes (core/modifier) shared between many phrased-keywords, are selected using statistic method and string affix matching
with its topic-specific keywords. A hierarchical relationship of super-topics and sub-topics is defined by a taxonomy, meanwhile, Wikipedia is used to provide context and background knowledge for topics that defined in the taxonomy to guide the term identification and extraction. The experimental results have shown the
degree of relevancy for the user than is currently available with conventional methods, for example, using matching keywords. We describe here our method and the relation between the scenes and discuss a prototype system.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.