The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Large-scale streaming URLs are the norm in many commercial software products that aim to filter URLs based on their sensitivity or risk level. In such problem scenarios, filtering is typically done by classifying a URL using either its webpage content or certain additional contextual information. However, such approaches are slow and computationally expensive, as they require gathering and processing...
We describe a novel clustering technique for clustering short texts, such as URLs, without enriching it with the help of external knowledge sources. Our technique first performs feature clustering to identify the key features of the dataset and then reconstructs the dataset on the basis of the key features. Then, it computes the similarity of the short texts belonging to the reconstructed dataset...
This work addresses the problem of URL topic classification by making use of the text of Uniform Resource Locators (URLs). We have introduced a method for classifying the web pages into topics by extending the Jaccard distance measure and using the n-gram approach. We have also compared our method with the best performing known distance measures for Boolean data in the literature i.e. Jaccard, Dice...
In this paper, we propose a differential reward based online learning algorithm for classifying web pages into predefined topics based on minimal text available in the URLs. It is then compared with two baseline methods, i.e., Support Vector Machine (SVM) and a state-of-the-art Reinforcement Learning Algorithm using recall, precision and F-measure scores. We conducted experiments on large scale Open...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.