The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One key step in text mining is the categorization of texts, i.e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated meaning in texts. This paper ushers in domain ontology and constructs new conceptual vector space model in...
Text categorization is one important task of text mining, for automated classification of large numbers of documents. Many useful supervised learning methods have been introduced to the field of text classification. Among these useful methods, K-Nearest Neighbor (KNN) algorithm is a widely used method and one of the best text classifiers for its simplicity and efficiency. For text categorization,...
Rapid progress of network arouses much attention on Internet public opinion, it is important to grasp the internet public opinion in time and understand the trends of their opinion correctly. Text mining plays a fundamental role in categorization and monitoring of internet public opinion, but internet public opinion is much more difficult than pure-text process because of their semi-structured characteristic...
The traditional weighting schemes used in text categorization for the vector space model (VSM) cannot exploit information intrinsic to texts obtained through online handwriting recognition or any OCR process. Especially, top n (n > 1) recognition candidates could not be used without flooding the resulting text with false occurrences of spurious terms. In this paper, an improved weighting scheme...
Rapid progress of network arouses much attention on Internet public opinion. To address this issue, we propose a novel system for categorization and monitoring of Internet public opinion. Due to the text format of Internet public opinion and the semantic relationship between words in such documents, we introduce latent semantic analysis (LSA) to represent document of public opinion. Compared to the...
With the development of the Web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on vector space model (VSM). The result of text preprocessing...
Text representation, which is a fundamental and necessary process for text-based intelligent information processing, includes the tasks of determining the index terms for documents and producing the numeric vectors corresponding to the documents. In this paper, multi-word, which is regarded as containing more contextual semantics than individual word and possessing the favorable statistical characteristics,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.