The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Aimming at the ever-present problem of imbalanced data in text classification, the authors study on several forms of imbalanced data, such as text number, class size, subclass and class fold. Some useful conclusions are gotten from a series of correlative experiments: first, when the text of two class is almost the same number, the difference of word number become major factor to affect the accuracy...
Automatic text categorization has been one of the hotspots in the information processing field. To aim at the important impact of feature weight calculating on text classification accuracy, first, the relationship between text representation model and feature weight calculating is studied, and the existed methods of feature weight calculating are analyzed, then the common idea of feature weighting...
Feature selection is a key step in text categorization, its results has direct influence on the classification accuracy. Evaluation function is usually adopted in feature selection method to calculate the value of feature words,and the feature words which assessed value is higher than setted threshold are maintained as the final feature subset.So the threshold is the important factors of feature selection...
Imbalanced data set has caused a significant drawback of the classification performance attainable by most normal machine learning algorithm. However, the samples are often imbalanced. Therefore, how to reduce the effects of uneven distribution of training sets on text classification performance is a great challenge for machine learning on imbalanced data sets. Currently, the study on imbalaced data...
Recently, automatic text categorization has made rapid progress and been one of the hotspots in the information processing field. Text tendency classification is one type of text categorization, which has very important applications in information retrievals bad information identification and filtering , content security management and analysis of public opinion tendency. To aim at the important influence...
Aiming at the importance of the analysis for public opinion on Internet, the authors propose a high-performance extraction method for public opinion. In this method, the space model for classification is adopted to describe the relationship between words and categories. The combined feature selection method is used to remove noisy words from the original feature space effectively. Then the category...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.