The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper is for text categorization of Enron email corpus, we use the information bottleneck (IB) method to cluster the key words based on their distribution on different class labels, then we use threads and address groups as additional features to email texts, and the maximal entropy model to improve the accuracy of the classifier. Our experimental results shows that these measures can improve...
The determination of a Bayesian network structure, especially in the case of wide domains, can be often complex, time consuming and imprecise. Therefore the interest of scientific community in learning Bayesian network structure from data is increasing: many techniques or disciplines, as data mining, text categorization, ontology building, can take advantage from structural learning. In literature...
Document Clustering is a widely studied problem in Text Categorization. It is the process of partitioning or grouping a given set of documents into disjoint clusters where documents in the same cluster are similar. K-means, one of the simplest unsupervised learning algorithms, solves the well known clustering problem following a simple and easy way to classify a given data set through a certain number...
With the expanding of text comment information, text sentiment classification become a hot issue. Domestic research on chinese sentiment classification mainly focus on segmentation and features selection or focus on classifying algorithm based on statistics. Rules mining method is a kind of important techniques of text classification. This paper propose a new approach which apply the rule mining by...
Clustering techniques have been used by many intelligent software agents in order to retrieve, filter, and categorize documents available on the World Wide Web. Clustering is also useful in extracting salient features of related Web documents to automatically formulate queries and search for other similar documents on the Web. Traditional clustering algorithms either use a priori knowledge of document...
With the development of the Web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on vector space model (VSM). The result of text preprocessing...
Web research in Mexico has been addressing issues related mainly to search mechanisms, information extraction, and mediating user interaction and group collaboration. In this paper we provide an overview of representative projects in the area and present a sample of recent advances by research groups in Mexican institutions. These include initiatives aimed to exploring extraction techniques that regard...
Text classification is one of the practices of knowledge discovery. Designation of the classifier is the most important par of text classification. Comparing with the methods based on statistic theory, classification based on rule learning is a better one on some situations. A granular computing approach is proposed to learn rules by constructing a granule network while classifying texts. The algorithm...
Text classification is a very important technique for gathering Web information. A novel approach based on multi-population collaborative optimization is proposed for the extraction of Web text classification rules. The information entropy was applied for the initialization of the populations and the multi-population collaborative optimization was applied for the evolution of the populations. The...
Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. One of the important tasks in the construction of concept hierarchy is the identification of suitable terms with appropriate size of domain vocabulary. One way of achieving such a size is by using...
Port state control (PSC) inspection is the most important mechanism to ensure world marine safe. Recently, some SVM-based risk assessment systems have been presented in the world. They estimate the risk of each candidate ship based on its generic factors and history inspection factors to select high-risk one before conducting on-board PSC inspection. However, how to improve the performance of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.