The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text Categorization (TC) is an important component in many information organization and information management tasks. In Text Categorization question there will be too many instances which need much computation time and memory requirement. It proposes a Generalization Capability (GC) algorithm that has the highest average generalization accuracy in these experiments, especially in the presence of...
Feature weighting is an important issue in text categorization. In this paper we analyze the characteristics of rough set theory and TF-IDF, and propose a feature weighting scheme for text categorization by applying approximation accuracy and approximation quality in variable rough set model. The decision information of a feature for categorization is introduced into the weight, which reflects the...
Text Classification is an important research area in Chinese information processing, whose goal is on the base of analyzing the text content to give the allocation of one or more of the text to more appropriate classes to enhance the text retrieval, storage, applications such as processing efficiency. In this paper, text dataset is transformed to information system without attribute of decision making...
The feature selection is a key method of text categorization technology, this paper proposed a text feature selection method based on the improved of mutual information and genetic algorithm. Used the improved of mutual information algorithm to do the initial choose to removing redundancy and noise words at first, and then used the genetic algorithm to training the template which generate by a subset...
Considering the statistical text classification problem we approximate class-conditional probability distributions by structurally modified Poisson mixtures. By introducing the structural model we can use different subsets of input variables to evaluate conditional probabilities of different classes in the Bayes formula. The method is applicable to document vectors of arbitrary dimension without any...
Efficiency of feature selection affects the whole classifier performance in text categorization. Integrating the distinct aspects of indiscernibility capability of rough set theory and good generalization ability of support vector machine, this paper proposes a new classification method named Rough Support Vector Machine. Rough set was employed as an attribute reduction tool to work on the original...
The high dimensional data are frequently met when we apply Web text classification. Mining in high dimensional data is extraordinarily difficult because of the curse of dimensionality. We must adopt feature dimensionality reduction to solve these problems. A attribute reduction algorithm based on rough set theory is given in this paper to reduce the text feature term and extract rule. First, the weight...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.