The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The problem of spam detection is a crucial task in the web information retrieval systems. The dynamic nature of information resources as well as the continuous changes in the information demands of the users makes the task of web spam detection a challenging topic. So far many different methods from researchers with different backgrounds have been proposed to tackle with spam web pages problem. In...
Web page classification is the automated assigning of predefined subject category to the document. Automatic Web page classification is one of the most essential techniques for Web mining given that the Web is a huge repository of various information including images, videos etc. And there is a need for categorization Web pages to satisfy user needs. The classification of Web pages into each category...
Both classification and ranking strategy have been reported positively in mining the named entity (NE) translation from the snippets re-turned by the Web search engine. Taking the most challenging issue of the organization name and its translation as an example, this paper conducts a contrastive study on the two strategies under SVM framework. We empirically show that the method of translation ranking...
Improvised explosive device web pages represent a significant source of knowledge for security organizations. In this paper, we present significant improvements to our approach to the discovery and classification of IED related web pages in the Dark Web. We present a statistical feature ranking approach to the expansion of the keyword lexicon used to discover IED related web pages, which identified...
Traditional automatic classifiers often conduct misclassifications. Folksonomy, a new manual classification scheme based on tagging efforts of users with freely chosen keywords can effective resolve this problem. Even though the scalability of folksonomy is much higher than the other manual classification schemes, the method cannot deal with tremendous number of items such as whole Weblog articles...
This paper presents a new algorithm of Web page classification, CUCS(Combined UC and SVM), for large training set. CUCS combines the advantages of SVM (Support Vector Machine) and UC (Unsupervised Clustering), achieving high precision and fast speed. In the training stage, CUCS gets clustering centers, which include positive example centers and negative ones, by means of UC. Then CUCS prunes training...
In the recent few years, web mining has become a hotspot of data mining with the development of Internet. Web pages classification is one of the essential techniques for web mining since classifying web pages of an interesting class is often the first step of mining the web. The high dimensional text vocabulary space is one of the main challenges of web pages. In this paper, we study the capabilities...
Social tagging allows users to assign keywords (tags) to resources facilitating their future access by the tag creator, and possibly by other users. In terms of its support for resource discovery, social tagging has both proponents and critics. The goal of this paper investigates if tags are an effective means for helping users locate useful resources. Adopting techniques from text categorization,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.