The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Due to the information on the Web being tremendous, dynamic and irregular, it is difficult to search and integrate information from the Web. This paper proposes a Web information extraction algorithm based on Ontology and DOM tree. The areas are accurately found out and the interested information is extracted exactly by information extraction rules generated by ontology. Furthermore this algorithm...
This paper studies the problem of extracting data from large numbers of semi-structured web pages. The fact that many websites have enormous pages generated dynamically from a underlying structured source like a database makes it feasible to induct a common template for similar web pages and then extract data accordingly. Previous work on this problem has limited practical utility because of either...
It has become much more difficult to access relevant information from the Web With the explosive growth of information available on the World Wide Web. One of the promising approaches is web usage mining, which mines web logs for user models and recommendations. Different from most web recommender systems that are mainly based on clustering and association rule mining, this paper proposes an web personalization...
This paper presents a Web framework as an application of steganography. The framework enables us to stealthily organize a tree of Web objects behind another. The Web objects to be embedded are automatically assigned to cover files appropriately. When embedding is done, we obtain stego-objects those can be uploaded to a Web server as ordinary Web objects. We can retrieve files embedded in stego-objects...
In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, data can be handled in a way similar to instances of a traditional database, which in turn can facilitate application of Web data integration and various other domain specific problems. In this paper, we propose...
We propose a new technique to infer the structure and extract the tokens of data from the semi-structured Web sources which are generated using a consistent template or layout with some implicit regularities. The attributes are extracted and labeled reversely from the region of interest of targeted contents. This is in contrast with the existing techniques which always generate the trees from the...
In the present world of communication and information interchange where more and more users are bound to use the same services and data with different access levels, the need for providing protection against potential breach of secured data has gained profound importance. Some of the common services are virtual private network (VPN), remote access server (RAS), Web server, mail server etc. In the...
Catalog pages construct the intermediate layer in architecture of a standard Web site; therefore research on information retrieval for this kind of pages can be beneficial to improve Web crawler's efficiency. A page is called "catalog-style" if its main body is displayed as a sequence of regular entries, and the central link in each entry apparently contains the pagepsilas major information...
WAP-Mine is one of algorithms for mining frequent web access patterns from web access database. It generates frequent web access patterns by recursively mining the web access pattern trees by use of WAP-tree. But in the process of mining frequent web access pattern, WAP-Mine generates many intermediate data which lowers efficiency especially at the lower support. In this paper, TD-Mine, a new algorithm...
With the fast development of Internet, the Web has already been an enormous database so far, which contains extremely abundant information. Most of Web pages are represented their content by using a list of objects, such as search engine results, product information of shopping Web sites and so on, and these objects form the primary information of each page. In this paper, we focus on the issues of...
In order to realize Web information retrieval using characteristic tree structured patterns in semistructured Web documents, methods for discovering frequent patterns or common characteristics in semistructured documents become more and more important. We have studied methods for discovering maximally frequent tree structured patterns in semistructured Web documents. A tag tree pattern is an edge...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.