The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Search engine optimization (SEO) is a process of improving the prominence of a website. Following a reverse engineering approach, in this paper, we study and analyze the key influence factors in the process of web search. We firstly build a system to automatically crawl all factors of 200 thousand web pages. Then we make a content analysis including Page Rank, URL and HTML analysis based on top 20...
Better understanding the document logical components is crucial to many applications, e.g., document classification or data integration. As the development of digital libraries, more people realize the importance of the scientific tables, which contain valuable information concisely. Although tons of previous table works focus on table data extraction, few concrete works on understanding and utilizing...
Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. These features are not only related to quantitative data extracted from the Web pages, but also...
The Web information which influences the topic relevance of URL is analyzed based on the research of the search strategy about the crawler. On this basis, a new URL search algorithm based on the content and link analysis is supplied to us. The experimental results show that the algorithm not only can solve the problem of topic isolated island to increase recall, but also can avoid the phenomenon of...
In order to solve the problem that we can only collect data from one single data source at some fixed time after mining the keywords in a rather superficial level, and to take full use of the information returned by search engines to construct the social relationship network based on the semantic link of the searched subject, we do the regular research by using the ROST Content Mining System which...
The PageRank algorithm, proposed by [Page et al., 1998] is used in the Google search engine to improve the results of requests by taking into account the link structure of the Web. PageRank give the same weight to all pages that is the surfer model is proposed using a uniform distribution. Richardson and Domingoshave proposed a more interesting and intelligent surfer model combining the link and content...
Spams are no longer limited to emails and Web-pages. The increasing penetration of spam in the form of comments in blogs and social networks has started becoming a nuisance and potential threat. In this work, we explore the challenges posed by this type of spam in the blogosphere with substantial generalization regarding other social media. Thus, we investigate the characteristics of comment spam...
Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size of the web. Focused crawlers aim to search only the subset of the web related to a specific topic, and offer a potential solution to the problem. But it also has problems. The major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem...
Social networks have recently attracted much attention for their importance to the semantic Web. Several methods exist to extract social networks for people from the Web based on co-occurrence information. This paper proposed a content analysis based method for automatic obtaining social networks among various entities from Chinese event-based news stories. First, the input articles are annotated...
Content analysis of search engine user queries is an important task for search engine research, and identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. The purpose of this study is to provide automatic new topic identification of search engine query logs, and estimate the effect of statistical characteristics of search engine...
Web users are always distracted by a large number of results returned from search engines. Clustering can efficiently facilitate users' browsing pages of certain topic. However, most traditional clustering methods are based on either content analysis or link analysis alone, which appears unilateral. In this paper, we propose an expanding clustering idea with the reasonable combination of content and...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.