The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Effective classification of web pages can improve the quality of information retrieval. The traditional classification algorithms are basically based on the analysis of Web content, but the content of the web page is complicated, filled with a large number of false, erroneous information, has seriously affected the accuracy of the classification of network information. To solve this problem, this...
Link spam techniques can enable some pages to achieve higher-than-deserved rankings in the results of a search engine. They negatively affect the quality of search results. Classification methods can detect link spam. For classification problem, features play an important role. This paper proposes to derive new features using genetic programming from existing link-based features and use the new features...
The problem of spam detection is a crucial task in the web information retrieval systems. The dynamic nature of information resources as well as the continuous changes in the information demands of the users makes the task of web spam detection a challenging topic. So far many different methods from researchers with different backgrounds have been proposed to tackle with spam web pages problem. In...
Along with the rapid popularity of the Internet, crime information on the web is becoming increasingly rampant, and the majority of them are in the form of text. Because a lot of crime information in documents is described through events, event-based semantic technology can be used to study the patterns and trends of web-oriented crimes. In our research project on cyber crime mining, we construct...
In this paper, we propose a method to classify Web documents by genre (not by topic) based on features of words and HTML tags. For classification, we use SVM (support vector machine) and Naiumlve Bayes. In order to improve the accuracy of classification, we calculate discriminant efficiencies of each pair of a word and a HTML tag to find out HTML tags which are effective in classification. The experimental...
Traditional automatic classifiers often conduct misclassifications. Folksonomy, a new manual classification scheme based on tagging efforts of users with freely chosen keywords can effective resolve this problem. Even though the scalability of folksonomy is much higher than the other manual classification schemes, the method cannot deal with tremendous number of items such as whole Weblog articles...
Since the Internet has become a huge repository of information, many studies address the issue of web pages categorization. For web page classification, we want to find a subset of words which help to discriminate between different kinds of web pages, so we introduced feature selection. In this paper, we study some feature selection methods such as ReliefF and Symmetrical Uncertainty. Also, the high...
In the recent few years, web mining has become a hotspot of data mining with the development of Internet. Web pages classification is one of the essential techniques for web mining since classifying web pages of an interesting class is often the first step of mining the web. The high dimensional text vocabulary space is one of the main challenges of web pages. In this paper, we study the capabilities...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.