The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data mining has been used as a technology in various applications of engineering, sciences and others to analysis data of systems and to solve problems. Its applications further extend towards detecting cyber-attacks. We are presenting our work with simple and less efforts similar to data mining which detects email based phishing attacks. This work digs html contents of emails and web pages referred...
Search engine optimization (SEO) is a process of improving the prominence of a website. Following a reverse engineering approach, in this paper, we study and analyze the key influence factors in the process of web search. We firstly build a system to automatically crawl all factors of 200 thousand web pages. Then we make a content analysis including Page Rank, URL and HTML analysis based on top 20...
Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semantic relationships among them. In this work, we propose a novel web page semantic structure model, called...
The Web information which influences the topic relevance of URL is analyzed based on the research of the search strategy about the crawler. On this basis, a new URL search algorithm based on the content and link analysis is supplied to us. The experimental results show that the algorithm not only can solve the problem of topic isolated island to increase recall, but also can avoid the phenomenon of...
In order to solve the problem that we can only collect data from one single data source at some fixed time after mining the keywords in a rather superficial level, and to take full use of the information returned by search engines to construct the social relationship network based on the semantic link of the searched subject, we do the regular research by using the ROST Content Mining System which...
Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size of the web. Focused crawlers aim to search only the subset of the web related to a specific topic, and offer a potential solution to the problem. But it also has problems. The major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.