The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Link spam techniques can enable some pages to achieve higher-than-deserved rankings in the results of a search engine. They negatively affect the quality of search results. Classification methods can detect link spam. For classification problem, features play an important role. This paper proposes to derive new features using genetic programming from existing link-based features and use the new features...
With the rapid development of the Internet, popular entities have more and more instances on the Web. It is observed that, on one hand, for the same Web entity, different Web entity instances often contain different attributes, and for the same attribute, different Web entity instances often use different labels; on the other, new Web entity instances which contain new attributes and labels are appearing...
With the development of the Internet, network users urgently hope the service providers to provide the services what they need. The paper introduces the basic structure based on the analysis of user behaviour. In the structure, different contents of the user behaviour are achieved. After dealing with these contents, the knowledge of data mining is used to analyze the degree of the user interest. And...
Both classification and ranking strategy have been reported positively in mining the named entity (NE) translation from the snippets re-turned by the Web search engine. Taking the most challenging issue of the organization name and its translation as an example, this paper conducts a contrastive study on the two strategies under SVM framework. We empirically show that the method of translation ranking...
Blog Distillation is the process of finding a blog with a principle and recurring interest. In this paper, two baselines are used to validate the results of our experiments. A set of features of individual feed is firstly constructed by decision tree to represent the similarity distribution of every feed against certain interest. Features are then selected by computing their centroid distances to...
This paper presents a new algorithm of Web page classification, CUCS(Combined UC and SVM), for large training set. CUCS combines the advantages of SVM (Support Vector Machine) and UC (Unsupervised Clustering), achieving high precision and fast speed. In the training stage, CUCS gets clustering centers, which include positive example centers and negative ones, by means of UC. Then CUCS prunes training...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.