The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a detailed study of technologies based on Hadoop and MapReduce available over the cloud for large-scale data mining and predictive analytics. Although some studies may have shown that cloud technologies relying on the MapReduce framework do not perform as well as parallel database management systems, e.g., with ad hoc queries and interactive applications, MapReduce has still been...
This paper is for text categorization of Enron email corpus, we use the information bottleneck (IB) method to cluster the key words based on their distribution on different class labels, then we use threads and address groups as additional features to email texts, and the maximal entropy model to improve the accuracy of the classifier. Our experimental results shows that these measures can improve...
Concerning the requirement of e-mail filtering to improve the efficiency and accuracy in e-mail mining, topic detection, and many other specific applications, learnt from traditional spam filtering methods, an approach based on feature analysis and text classification is proposed. Utilizing some structural features which are very likely to identify an irrelevant e-mail, such as group sending, embedded...
The existence of vast unstructured text and the importance of the text information make the text mining technology be a hot research spot of Data Mining. Text classification is a very important subtask in the text mining. This paper focuses on the study of Chinese text classification based on single Chinese character feature. The experimental results indicate that the feature selection based on single...
Online deception is disrupting our daily life, organizational process, and even national security. Existing approaches to online deception detection follow a traditional paradigm by using a set of cues as antecedents for deception detection, which may be hindered by ineffective cue identification. Motivated by the strength of statistical language models (SLMs) in capturing the dependency of words...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.