The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Internet and search engines are increasing its prominence in modern day life. Search engines like Google, Bing and Yahoo are perhaps the largest source of information that anyone can access at anytime in the present day life. People have different interests while using the Internet. Advanced users could be interested in automatically extracting information from pages for later processing and web mining,...
In online business, it is important to construct sale web pages offering attractive services for popular products in order to improve page access as well as purchase rates. Moreover, online shop owners need to hold various types of sales frequently throughout the year to keep customers coming back. Also, online shopping systems have to adapt to its circumstances such as customers' needs and the surrounding...
With the exponential growth of information on the World Wide Web, there is a challenge to find for a document that contains specific information on the Internet. There are many statistical documents that are available on the Web. However, to search, recognize and take these kinds of documents need much effort and also require much time. One of the solutions that can be used to do that is a web crawler...
Currently, we are faced with a major problem concerning the use of malicious codes in hacking attacks, which is becoming increasingly intelligent. We became curious about the major factors behind the cyber incident, which prompted us to conduct research about the weighted value of resources that are used to commit malicious acts. The major approach of this paper consists in calculating the weighted...
Currently, most of the existing web cache replacement strategies are based on temporal locality, frequency distribution and size distribution while spatial locality gets less attention. This paper proposed a reference model to evaluate the spatial locality of web object accessing, and used the reference model as the basis of a new cache replacement strategy -- GDSR, Greedy-Dual-Size Reference. GDSR...
Comprehensibility is an important quality aspect of documents. Incomprehensible documents are of little utility to readers even if they are relevant. However, for many difficult queries such as technical ones, the topically relevant documents tend to be characterized by poor comprehensibility. This makes it difficult for users to satisfy their information needs when searching for documents about difficult...
On the market there are many commercial web classification services and a few publicly available web directory services. Unfortunately they mostly focus on English-speaking web sites, making them unsuitable for other languages in terms of classification reliability and coverage. This paper covers the design and implementation of a web-based classification tool for TLDs (Top Level Domain). Each domain...
Web entities contain a wealth of information. Customers would more like to get a list of relevant entities instead of a list of web pages when they submit a query to the search engine. So the research of related entity finding (REF) is a meaningful work. In this paper we investigate the last task of REF: Entity Homepage Finding. In this paper, we propose a combining multi-attributes (five attributes)...
The structure of the web has been extensively studied using HTML-based data. However, the increase in dynamic and personalized content has made the analysis of HTML-based data more difficult. A viable alternative to studying the web using HTML data is to study the web using DNS traffic traces. In this paper, we conduct a preliminary study to investigate the question - What can DNS traffic traces tell...
The development of Web bots capable of exhibiting human-like browsing behavior has long been the goal of practitioners on both side of security spectrum -- malicious hackers as well as security defenders. For malicious hackers such bots are an effective vehicle for bypassing various layers of system/network protection or for obstructing the operation of Intrusion Detection Systems (IDSs). For security...
We present SAFEWapp, an open-source static analysis framework for JavaScript web applications. It provides a faithful (partial) model of web application execution environments of various browsers, based on empirical data from the main web pages of the 9,465 most popular websites. A main feature of SAFEWapp is the configurability of DOM tree abstraction levels to allow users to adjust a trade-off between...
In this paper, we designed a mechanism to mitigate the existence of device fingerprinting process by utilizing personal HTTP proxy. For the purpose of detection and prevention of device fingerprinting, the proxy utilizes a two-stage filtering process; header-based filtering and content-based filtering. We define 6 values of attribute among with their weight based on the value of the highest entropy...
Preventing juveniles from accessing pornographic web pages remains a problem in Vietnam. The existing tools have failed to block these Vietnamese sites automatically and rely only on configuring black list and white list. In fact, the Vietnamese and English are different in both syntax and semantic, therefore, applying methods used for English into Vietnamese will definitely be much less effective...
The Internet has always been growing with all the contents and information added by different types of users. Without proper storage and indexing, these contents can easily be lost in the sea of information housed by the Internet. Hence, an automated program, known as the web crawler is used to index all the contents added to the Internet. With proper configurations and settings, a web crawler can...
Huge amount of entities and theirs relationships are posted on the Web. Those entities and theirs relationship networks help many activities. In this paper, we focus on the task of extracting academic entity network from homepages. Homepages usually contain many entities, such as persons, conference/journal and organization and theirs relationship. However, homepages don't follow a unified layout...
As a means to share knowledge, the community question answering (CQA) service provides users a chance to obtain or provide help by raising or answering questions. After a question is posted, the system must find an appropriate individual to answer this question. Several approaches have recently been proposed to find experts in CQA. In this paper, a new method to find experts in CQA is proposed by...
Nowadays, while we are enjoying the convenience brought by such a huge repository of online web information, we may come across difficulties in finding the web pages we want related to particular information we are searching for. Hence, it is essential to classify web documents to facilitate the search and retrieval of pages. Existing algorithms work well with a small quantity of web pages, whereas,...
URL redirection is necessary in web applications. Well-designed redirection makes better user experience. However, if used improperly, it could give rise to attacks such as phishing. These improperly used redirections are called Unvalidated Redirects and Forwards (URF). This paper prescribes a mechanism to systemically discover URF vulnerabilities in web applications. The prototype implementation,...
In the past, there have been many attempts at developing accurate models of human-like browsing behavior. However, most of these attempts/models suffer from one of following drawbacks: they either require that some previous history of actual human browsing on the target web-site be available (which often is not the case); or, they assume that ‘think times’ and ‘page popularities’ follow the well-known...
Wikipedia, a collaborative and user driven encyclopedia is considered to be the largest content thesaurus on the web, expanding into a massive database housing a huge amount of information. In this paper, we present the design and implementation of a MapReduce-based Wikipedia link analysis system that provides a hierarchical examination of document connectivity in Wikipedia and captures the semantic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.