The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution...
Nature language processing is an important part in data mining, which counts a lot in the internet age. Feature extraction effects the accuracy of text classification. This paper proposes a method of iterative feature space evolution to optimize the result. Adjusting the extended dictionary and the stop word list, we optimize the feature space time and again to get a better classifier model. The final...
The paper is devoted to the issues of automated categorization of textual information which can be applied in the systems intended to block inappropriate content. The approach used for feature selection and construction is proposed. The text mining methods used for research (Decision Tree classifiers) are analyzed. Besides that, the techniques of Web sites analysis that provide information in different...
Text classification is continuing to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Classifying documents to closely related categories is the most complex task in text categorization. Feature selection is an essential preprocessing step for improving the efficiency and accuracy of the text classifiers by removing redundant and...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.