The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a social computing tool that centers around social scientists. In the past years, we have worked with social scientists and cultural anthropologists. We learned their ways of studying subjects in social media, what their needs are, and their interests. In the process, we have built a generic platform for collecting data in the blogosphere, tracking blogs of particular interests,...
Within this paper we introduce a framework for semi- to full-automatic discovery and acquisition of bag-of-words style interest profiles from openly accessible Social Web communities. To do such, we construct a semantic taxonomy search tree from target domain (domain towards which we're acquiring profiles for), starting with generic concepts at root down to specific-level instances at leaves, then...
This article investigates the dynamic features of social tagging vocabularies in Delicious, Flickr and YouTube from 2003 to 2008. It analyzes the evolution of the usage of the most popular tags in each of these three social networks. We find that for different tagging systems, the dynamic features reflect different cognitive processes. At the macro level, the tag growth obeys power-law distribution...
Vertical search engines use focused crawlers as their key component and develops some specific algorithms to select web pages relevant to some pre-defined set of topics. Therefore, to effectively build up a semantic pattern for specific topics is extremely important to such search engines. Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks. Here we propose an...
Semantic focused crawler is an important part of semantic vertical search engine. It is receiving increasing attention as a well founded alternative to search web with the problem of locating topical resource on entire web. In order to retrieval documents related to a given topic, in this paper, we propose QBLP Algorithm which enable crawler adaptive with the changing environment. This feature makes...
Digital library users might not enter a digital library through homepage menus. As a result, digital library owners should consider the visibility to search engines of stored PDF documents. The aim of this research project was to determine to what extent the visibility of these PDF documents can be improved. In a series of empirical experiments, 100 PDF documents stored on digital libraries were identified...
Although intensive researches have been performed regarding P2P network measurement, it is still unknown to what extent the measurement system influences the final measurement results. As an initial study, we investigated the influence of a measurement system on degree distribution of a P2P network. Theoretical analysis and simulation results suggest an interesting phase-transition phenomena when...
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Ironically the very size of this collection has become an obstacle for information retrieval. The user has to shift through scores of pages to come upon the information he/she desires. Web crawlers are the heart of search engines. Web crawlers continuously keep on crawling the web and find any new web pages...
Distributed Web crawling (DWC) over DHTs is proposed to solve the bottlenecks in the traditional Web crawling. The core of this kind of system is its fully distributed task scheduling mechanism in which the crawlers are treated as peers and the crawlees are treated as resources maintained by the peers. A system model based on the content addressable network (CAN) can further optimize the scheduling...
Websites, notable by URLs are large collection of Web pages. They make a huge database of heterogeneous information gathered and collected distributive. The accumulated information is differentiated on the basis of certain templates, their URLs and information contained in these pages. In this research, we mainly concentrate on Web-forums. In the current circumstances, a Web crawler crawls all the...
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. Crawling the Web quickly and entirely is an expensive, unrealistic goal because of the required hardware and network resources. A focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.