The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we describe methods of performing data mining on web documents, where the web document content is represented by graphs. We show how traditional clustering and classification methods, which usually operate on vector representations of data, can be extended to work with graph-based data. Specifically, we give graph-theoretic extensions of the k-Nearest Neighbors classification algorithm...
The Norme in rete (NIR) [Legislation on the Net] national project aims at making easier the retrieval and the navigation between legal documents in a distributed environment and to encourage the development of systems with characteristics of interoperability and effective of use. In order to obtain this, two standards have been defined: a URN standard, to identify these materials through uniform names,...
Structural analysis of web pages has been proposed several times and for a number of reasons and purposes, such as the re-flowing of standard web pages to fit a smaller PDA screen. elISA is a rule-based system for the analysis of regularities and structures within web pages that is used for a fairly different task, the determination of editable text blocks within standard web pages, as needed by the...
With the ubiquity of the Web, the volume of Web documents continues to grow at a rapid speed. Since the Web is a vast source of information, extracting useful information from Web documents is important. HTML (Hypertext Markup Language), which is a format for visual rendering of Web documents, defines tag for representation of a table. On the other hand, most of the existing HTML documents...
Most e-mail readers spend a non-trivial amount of time regularly deleting junk e-mail (spam) messages, even as an expanding volume of such e-mail occupies server storage space and consumes network bandwidth. An ongoing challenge, therefore, rests within the development and refinement of automatic classifiers that can distinguish legitimate e-mail from spam. A few published studies have examined spam...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.