The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Knowledge discovery from the Web is a cyclic process. In this paper we focus on the important part of transforming unstructured information from Web pages into structured relations. Relation extraction systems capture information from natural language text on Web pages, called Web text. However, extraction is quite costly and time consuming. Worse, many Web pages may not contain a textual representation...
Along with the rapid popularity of the Internet, crime information on the web is becoming increasingly rampant, and the majority of them are in the form of text. Because a lot of crime information in documents is described through events, event-based semantic technology can be used to study the patterns and trends of web-oriented crimes. In our research project on cyber crime mining, we construct...
This paper briefly introduces the concept and status quo of the information retrieving technique, proposes an automatic Web information retrieving system based on multi-agent. This system analyzes its model architecture and work principle, gives solution to critical problems, leads support vector machine (SVM) into intelligent filter subsystem, therefore to realize intelligent classifying and retrieving...
The explosive Web make it hard to organize and manage Web information automatically. Therefore, online learning method such as incremental learning is gradually become effective instrument in practical applications. From our experiments, traditional incremental learning shows some flaws in the iterative process. To overcome the drawback caused by using only support vector to represent the whole former...
In this paper, we propose a method to classify Web documents by genre (not by topic) based on features of words and HTML tags. For classification, we use SVM (support vector machine) and Naiumlve Bayes. In order to improve the accuracy of classification, we calculate discriminant efficiencies of each pair of a word and a HTML tag to find out HTML tags which are effective in classification. The experimental...
The increasing numbers of Web pages on the cyber world result to the less effectiveness of document retrieval that matches the need of users. The classification of Web pages is one of the solutions to solve this problem. This paper proposes VAMSVM_WPC model which is a novel voting algorithm for classifying the Web pages, which uses a multi-class SVM method. First, feature is generated from text and...
Catalog pages construct the intermediate layer in architecture of a standard Web site; therefore research on information retrieval for this kind of pages can be beneficial to improve Web crawler's efficiency. A page is called "catalog-style" if its main body is displayed as a sequence of regular entries, and the central link in each entry apparently contains the pagepsilas major information...
This paper presents a new algorithm of Web page classification, CUCS(Combined UC and SVM), for large training set. CUCS combines the advantages of SVM (Support Vector Machine) and UC (Unsupervised Clustering), achieving high precision and fast speed. In the training stage, CUCS gets clustering centers, which include positive example centers and negative ones, by means of UC. Then CUCS prunes training...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.