The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Now-a-days, World Wide Web is growing as a huge repository with high source of voluminous and heterogeneous information which continues to expand in size and complexity. But still Web pages are observed unstructured, semi-structured or asynchronous, which contains excess irrelevant information like advertisement, sponsored links, headers, footers etc. Hence guiding on to fetch particular documents...
With the development of Internet technology, locating and identifying the page needed quickly from a sea of pages has become one of the most pressing and important demand. For the sake of acquiring video pages from mass web pages and analyzing their principal characteristics, this paper proposes the technology, which combines the form and content feature of the video page with topic web crawler of...
Network arousing more and more attention at present, it is important to grasp the hot topic for news editing and public option surveying. For regional BBS forums, the hot topic is issued in the way of forum posts by individuals, instead of by official media. So the information is various, and the hot topic is hard to be found out. This paper introduced a regional BBS-oriented hot topic a tracking...
Due to the explosive growth in availability of Web services over the open Web and the heterogeneous sources in which they are available, discovering relevant web services for a given task continues to be challenging. In order to deal with these problems, a bottom-up approach based on finding published service descriptions to automatically build a service repository for developing a web service discovery...
Due to the explosive growth in availability of Web services over the open Web and in heterogeneous sources in which they are available, discovering relevant web services for a given task continues to be challenging. In order to deal with these problems, a bottom-up approach based on finding published service descriptions for developing a web service discovery framework was proposed. We discuss in...
In this paper, we have developed architecture for facilitating offline web utilization to fulfill the information requirements of completely detached communities. We illustrate an automatic means to create Rural Information Portals (RIPs). RIPs are big explorable information databases of web pages customized to the information requirements of an intended community. We merge an expert classifier with...
Matching one document with other documents is one of anti-plagiarism tasks. Matching can be performed both intra and extra-corpal. This paper will discuss extra-corpal matching utilize the web crawlers as reference search. The role of web-crawler described in extra-corpal anti-plagiarism architecture. Matching of plagiarism indication will use Modified Histogram Intersection based on N-Gram of term...
World Wide Web is hosting huge information regarding lots of areas and education is not an exception. Given the huge amount of data, searching for any educational resource manually is very difficult. To overcome this, an intelligent repository of educational resources that helps to decide among the available resources is needed. This paper discusses an attempt to build such repository. This will help...
URP as a new kind of education management information system model is the total integration of the University. The level of URP development reflects the overall level and strength of university. URP makes an overall height analysis of the internal relationship in the University and try to establish unified information standards, provide a platform and interface specifications by integrate various...
The proliferation of database-driven Web sites has made user pay more effort for selecting the best satisfying results. Therefore, we propose a searching system named as DeepSearcher to meet userpsilas need, which includes offline processing (e.g. pre-processing) and online processing. The latter consists of query processor, result integrator, cache subsystem and service portal. To implement the system,...
We present ldquoadvaRSSrdquo crawling mechanism which is created in order to support peRSSonal, a mechanism used to create personalized RSS feeds. In contrast to the common crawling mechanisms our system is focalized on fetching the latest news from the major and minor portals worldwide by utilizing their communication channels. The challenge between ldquoadvaRSSrdquo and a usual crawler is the fact...
The article is devoted to problems, which are related to the quality evaluation for the information Web resources. Basic tasks and perspective information technology of quality evaluation by the searching robot are selected.
This paper proposed an ontology-supported portal architecture: OntoPortal, which can integrate both internal components ontology-supported crawler and classifier to concern both mechanisms of precise searching and fast query of portals. The preliminarily experimental outcomes proved the technology proposed in this paper to be able to really up-rise the precision, recall rates of webpage searching,...
Existing search engines like Google, Yahoo!, Live etc. are not yet capable to answer queries that require deep semantic understanding of the query or the document. Instead, it is preferable to find and ask someone who has related expertise or experience on a topic and thus Web-based online communities have become important places for people to seek and share expertise. We need to gather the data that...
The characterization of HTTP traffic is crucial for performance evaluation and server design. In this paper, we analyze massive Web traces generated by various busy servers in recent years, trying to find the new features of modern HTTP traffic and user behaviors. Comparing the conclusions of earlier studies with our results, we have spotted considerable unconventional ingredients in modern HTTP traffic...
A lot of high quality and wealthy data are hidden in backend database and search engines can not index this page, which is called Deep Web. It is mostly accessible through query interfaces. SDWS, a semantic search engine for Deep Web is presented. We are studying and implementing semantic Web technology to the each process of Deep Web information integrated, and expertise in Deep Web discovering,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.