The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text analysis of a web page is more difficult than the analysis of the text of normal document due to the presence of additional information, such as HTML structure, styling codes, irrelevant text, and presence of hyperlinks. In this paper, we propose an unsupervised method to extract keywords from a web page. The
integrate information from multiple interrelated pages to answer keyword queries meaningfully. Next-generation web search engines require link-awareness, or more generally, the capability of integrating correlative information items that are linked through hyperlinks. In this paper, we study the problems of identifying the
In order to improve searching results of Web pages and enhancing Web crawling operation, the Web page clustering based on searching keywords is proposed in this paper, which firstly employed matching degree between Web pages and searching keywords to decide the sequence of showing pages of searching results. Then
huge irrelevant search hits. In this paper, we propose an improved method for ranking of search results to reduce human efforts on locating interesting hits. The search results are re-ranked using adaptive user interest hierarchies (AUIH), which considers both investigator-defined keywords and user interest learnt from
The growing number of literature in journals database raises a new and challenging search problem: locating desired literature. Traditional keyword search is insufficient: the specific literature users require is possibly not captured. We introduce a new algorithm of hierarchical clustering. With this algorithm, we
sense discovery problem. Given a query and a list of result pages, our unsupervised method detects word sense communities in the extracted keyword network. The documents are assigned to several refined word sense communities to form clusters. We use the modularity score of the discovered keyword community structure to
This paper proposes a system for finding a userpsilas interests on the Internet. It is based on his browsing behaviors and the contents of his visited pages. The system has two features. One is building userpsilas browsing interests implicitly, multiple keyword vectors, one per interest. The other is that it can
This paper explores a unique way in which the thinking algorithm adds an extra logical substrate to a Web search query using artificial intelligence. Instead of just going after keyword searching, the algorithm tries to assess the motives of the user behind entering a query. The algorithm tries to find the reasons as
In this paper, reclassification for the current classification through K-means would be implemented based on the feedback of Web usage mining in order to improve the accuracy of news recommendation and convergence of classification. It could extract most relative keywords and eliminate the disturbance of multi-vocal
how to eliminate ambiguity more easily and recommend more interested web pages to users. To resolve the above problems, we propose a novel mechanism named SSTAG, and it can recommend a set of Super-tags to users for their choices based on keywords input. As various topics related to the keywords, the Super-tags are
done on a set of data is chosen to form the basis as done with keywords. If the base data is chosen arbitrarily, it is automatic, whereas some 'knowledge' or 'background' is put in the choice it is adaptive. Statistical features of the images are extracted from the pixel map of the image. The extracted features are
being browsed, to discover relevant keywords for each document, and to effectively cluster the documents into semantically-significant groupings. The quality of the links is improved over time through passive user feedback collection. Our system can be deployed as a web service and has been tested on offline datasets as
terms from URL, Title tag and Meta tag to produce clusters of web documents. The reason for selecting these parts of a web page is that they contain keywords which are available in a web page. Clustering algorithm used in this paper is K-means. Proposed method of clustering is compared with snippet based clustering in
of keywords and the extended vector. Experiments show that the method can model hierarchical user interests with a promising result. When a new interest emerges, it does not need any adjustment like collecting new training data or rebuilding the classifier. It can capture the diversified user interests and map to an
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.