The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The following topics are dealt with: data mining; local clustering; spatiotemporal event detection; time series; Markov models; email classification; data stream; parallel mining; Bayesian network; unsupervised learning; missing values prediction; anomaly detection; decision tree; binary classifier; data similarity matrix; data mapping; support vector machine; Mapreduce; document similarity; social...
Learning multiple related tasks from data simultaneously can improve predictive performance relative to learning these tasks independently. In this paper we propose a novel multi-task learning algorithm called MT-Adaboost: it extends Ada boost algorithm to the multi-task setting; it uses as multi-task weak classifier a multi-task decision stump. This allows to learn different dependencies between...
The increased number of documents in digital format available on the Web and its useful information for different purposes entail an essential need to organize them. However, this task must be automated in order to save costs and manpower. In the community research, the main approach to face this problem is based on the application of machine learning techniques. This article studies the main machine...
To efficiently deal with spam mail filtering problem, a novel spam filtering algorithm based on locality pursuit projection (LPP) and least square version of SVM(LS-SVM) is proposed in this paper. The mail message features are first extracted by the LPP algorithm, then the LS-SVM classifier is used to classify mails into spam and legitimate. Experimental results demonstrate that the proposed algorithm...
Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation (GSVM-RAND), is designed. GSVM-RAND applies...
The text classification usually uses the statistical method to select characteristic. When it is carried out in different domains, the special interior knowledge relationships between domains will not be considered. In this paper, a new text classification model is proposed, which is based on the domain knowledge relations. This model adopts the support vector machine study algorithm, combine statistic...
With the rapid growth in computer technology and popularization of Internet, e-mail has become one economical and convenient form of communication. But different types of crime and civil action involving e-mail documents appear which do harm to people's life and social's stabilization. So the criminal e-mail's authorship has to be identified automatically for the purpose of computer forensic. To solve...
The document similarity measure is a key point in textual data processing. It is the main responsible of the performance of a processing system. Since a decade, kernels are used as similarity functions within inner-product based algorithms such as the SVM for NLP problems and especially for text categorization. In this paper, we present a semantic space constructed from latent concepts. The concepts...
This work implements an enhanced hybrid classification method through the utilization of the naive Bayes approach and the support vector machine (SVM). In this project, the Bayes formula was used to vectorize (as opposed to classify) a document according to a probability distribution reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities...
The electronic mail (e-mail) concept makes it possible to communicate with many people in an easy and cheap way. Though email brought us such huge convenience, it also caused us trouble of managing the large quantities of spam mails received everyday. Without appropriate counter-measures, the situation seems to be worsening and spare email will eventually undermine the usability of email. To efficiently...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.