The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Through research on the calculation method of feature words' weight in texts and semantic similarity between words, we proposed a calculation method of feature words' weight based on concept weight for the semantic association phenomenon of text features and the prevalence of high-dimensional problem in a text vector space model. This method reduces the semantic loss of the feature set and the dimension...
This paper is for text categorization of Enron email corpus, we use the information bottleneck (IB) method to cluster the key words based on their distribution on different class labels, then we use threads and address groups as additional features to email texts, and the maximal entropy model to improve the accuracy of the classifier. Our experimental results shows that these measures can improve...
Concerning the requirement of e-mail filtering to improve the efficiency and accuracy in e-mail mining, topic detection, and many other specific applications, learnt from traditional spam filtering methods, an approach based on feature analysis and text classification is proposed. Utilizing some structural features which are very likely to identify an irrelevant e-mail, such as group sending, embedded...
Technology of automatic text summarization plays an important role in information retrieval and text classification, and may provide a solution to the information overload problem. Text summarization is a process of reducing the size of a text while preserving its information content. This paper proposes a sentences clustering based summarization approach. The proposed approach consists of three steps:...
Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation (GSVM-RAND), is designed. GSVM-RAND applies...
Sem@ntica is a system for extracting the information contained in collections of documents into a knowledge base. It combines high quality conventional named entity analysis with an ontology class labeling capability for open class words. The ontology comprises an upper ontology and one or more domain ontologies. The system has tools for rapidly designing the ontology and mapping segments of Word...
With the rapid growth in computer technology and popularization of Internet, e-mail has become one economical and convenient form of communication. But different types of crime and civil action involving e-mail documents appear which do harm to people's life and social's stabilization. So the criminal e-mail's authorship has to be identified automatically for the purpose of computer forensic. To solve...
This paper presents a comprehensive approach to identifying protein name in biomedical texts. The new method integrated the generalized Winnow algorithm and the heuristic rules to implement of initial detection of protein name. Moreover, the system introduced a statistic method to analyses the reliability of recognized protein boundary, which can be then used for expanding protein boundary which has...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.