The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In recent years, the emergence of the Web and the dramatic increase in computing, storage and networking capacity has given rise to the concept of networked information spaces. The prime example of a networked information space is the World Wide Web itself. The Web, in its pure form, is a set of hypertext documents, with links in one document pointing to another document.
For barely a decade now the Web graph (the network formed by Web pages and their hyperlinks) has been the focus of scientific study. In that short a time, this study has made a significant impact on research in physics, computer science and mathematics. It has focussed the attention of the scientific community on all the different kinds of networks that have arisen through technology and human activity;...
This paper studies the expansion properties of randomly perturbed graphs. These graphs are formed by, for example, adding a random or very sparse Erdős-Rényi graph to an arbitrary connected graph. The central results show that there exists a constant δ such that when any connected n-vertex base graph is perturbed by adding a random 1-out then, with high probability,...
The estimated number of static web pages in Oct 2005 was over 20.3 billion, which was determined by multiplying the average number of pages per web server based on the results of three previous studies, 200 pages, by the estimated number of web servers on the Internet, 101.4 million. However, based on the analysis of 8.5 billion web pages that we crawled by Oct. 2005, we estimate the total number...
We study a geometric random tree model which is a variant of the FKP model proposed in [1]. We choose vertices v1, ..., vn in some convex body uniformly and fix a point . We then build our tree inductively, where at time t we add an edge from vt to the vertex in v1, ..., v...
PageRank is a key element in the success of search engines, allowing to rank the most important hits in the top screen of results. One key aspect that distinguishes PageRank from other prestige measures such as in-degree is its global nature. From the information provider perspective, this makes it difficult or impossible to predict how their pages will be ranked. Consequently a market has emerged...
This paper presents a novel stochastic model that explains the relation between power laws of In-Degree and PageRank. PageRank is a popularity measure designed by Google to rank Web pages. We model the relation between PageRank and In-Degree through a stochastic equation, which is inspired by the original definition of PageRank. Using the theory of regular variation and Tauberian theorems, we prove...
We study the problem of identifying and ranking the members of a community in a very large network with link analysis only, given a set of representatives of the community. We define the concept of a community justified by a formal analysis of a simple model of the evolution of a directed graph. We show that the problem of deciding whether a non trivial community exists is NP complete. Nevertheless,...
Users typically locate useful Web pages by querying a search engine. However, today’s search engines are seriously threatened by malicious spam pages that attempt to subvert the unbiased searching and ranking services provided by the engines. Given the large fraction of Web traffic originating from search engine referrals and the high potential monetary value of this traffic, it is not surprising...
We discuss a number of issues in the definition, computation and comparison of PageRank values that have been addressed sparsely in the literature, often with contradictory approaches. We study the difference between weakly and strongly preferential PageRank, which patch the dangling nodes with different distributions, extending analytical formulae known for the strongly preferential case, and corroborating...
One of the most useful measures of cluster quality is the modularity of the partition, which measures the difference between the number of the edges joining vertices from the same cluster and the expected number of such edges in a random (unstructured) graph. In this paper we show that the problem of finding a partition maximizing the modularity of a given graph G can be reduced to a minimum weighted...
In this paper, a phrase recommender algorithm is proposed that suggests the related frequent phrases to an incomplete user query. The suggested phrases are extracted from past user queries based on the frequency rate of the phrases. A query recommender algorithm called OQD (Online Query Discovery) has also been designed for comparison purposes. Simulation results show the efficiency of the proposed...
Generative models are often used in modeling real world graphs such as the Web graph in order to better understand the processes through which these graphs are formed. In order to determine if a graph might have been generated by a given model one must compare the features of that graph with those generated by the model. We introduce the concept of a hierarchical degree core tree as a novel way of...
The link structure of the Web is generally viewed as the webgraph, and web structure mining is a research area that mainly aims to find hidden communities in the Web and so on, by focusing on the webgraph. In this paper, we identify a common frequent substructure by observing the webgraph, and newly define it as an isolated star (i-star). We propose an efficient enumeration algorithm of i-stars, and...
One of the grand research and industrial challenges in recent years is efficient web search, inherently involving the issue of page ranking. In this paper we address the issue of representing and quantifying web ranking trends as a measure of web pages. We study the rank position of a web page among different snapshots of the web graph and propose normalized measures of ranking trends that are comparable...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.