The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The search engines many times give irrelevant searches which are based on general user preferences. Moreover, they maintain the user search logs and other information which is considered as privacy breach. The personal web crawlers not only magically understand precise requirements, but also they can be scheduled to automatically grab the information at regular intervals. These personal crawlers are...
The Web information which influences the topic relevance of URL is analyzed based on the research of the search strategy about the crawler. On this basis, a new URL search algorithm based on the content and link analysis is supplied to us. The experimental results show that the algorithm not only can solve the problem of topic isolated island to increase recall, but also can avoid the phenomenon of...
The PageRank algorithm, proposed by [Page et al., 1998] is used in the Google search engine to improve the results of requests by taking into account the link structure of the Web. PageRank give the same weight to all pages that is the surfer model is proposed using a uniform distribution. Richardson and Domingoshave proposed a more interesting and intelligent surfer model combining the link and content...
Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size of the web. Focused crawlers aim to search only the subset of the web related to a specific topic, and offer a potential solution to the problem. But it also has problems. The major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem...
Web forums provide platforms for any Internet users around the world to communicate with each other and express their opinions. In many of the discussions in Web forums, it involves issues related to terrorism and crime. Some participants are even using the platform to propagandize their ideology or recruit members to commit crime. In this work, we propose a Web forum analysis system to analyze the...
Due to the different structure and content character of blogs, the traditional ranking algorithm of Web page turns to be insufficient. To solve this problem, a novel ranking algorithm of blog is presented. The algorithm considers both the link analysis and the content analysis of the blog, It helps mining more implicit features of blog, such as common topics, to improve the satisfaction of the users...
Web users are always distracted by a large number of results returned from search engines. Clustering can efficiently facilitate users' browsing pages of certain topic. However, most traditional clustering methods are based on either content analysis or link analysis alone, which appears unilateral. In this paper, we propose an expanding clustering idea with the reasonable combination of content and...
A Weblog is a Web site where entries are made in diary style, maintained by its sole author - a blogger, and displayed in a reverse chronological order. Due to the freedom and convenience of publishing in Weblogs, this form of media provides an ideal environment as a propaganda platform for terrorist groups to promote their ideologies and as an operation platform for organizing crimes. In this work,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.