The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Feature selection for text classification is a well-studied problem and the goals are improving classification effectiveness, computational efficiency, or both. In this paper, we propose a two-stage feature selection algorithm based on a kind of feature selection method and latent semantic indexing. Traditional word-matching based text categorization system uses vector space model to represent the...
This paper compares the performance of linear and nonlinear kernels of Support Vector Machines (SVM) used for text classification. The study is motivated by the previous viewpoint that linear SVM performs better than nonlinear one, and that, although there are many investigations have proved that SVM performs well in text classification, there is no serious investigation on the comparison between...
Text Categorization (TC) is an important component in many information organization and information management tasks. In many TC applications, the case-base grows at a fast rate and this causes inefficiency in the case retrieval process. Using Case-Base Maintenance learning via the GC (Generalization Capability) algorithm, which can reduce the case number into KNN algorithm, can improve efficiency...
Based on the complex network theory, we proposed a clustering algorithm based on content similarity. Firstly, the Chinese documents are represented by the vector-space model, and the content similarity between any two documents is computed by the cosine similarity. Consequently, the network node is defined as a document, and the edge weight is defined as the similarity obtained by the cosine similarity...
Several features existed in Chinese texts result in technologic bottleneck in Chinese text mining, at present the results of Chinese text clustering obtained by traditional methods are not very satisfactory. In this paper, we propose the text clustering method by the English texts clustering method called as Text Clustering via Particle Swarm Optimizer (TCPSO) to solve the Chinese text clustering...
Since the emergence of BLOG, it not only represents a new network technology, but also means the beginning of a new life style. How to utilize and mine the BLOG content which contains hidden sentiment and real-time update is a big challenge in the data-mining domain. As most of the existing method for network text's topic mining is achieved through clustering text's topic and label which are labeled...
To give consideration to both human intuitive understanding and the information processing requirement of text sentiment, this paper describes a new formal computational model of Chinese text sentiment. The model includes two parts: formal presentation and formal computing. The former is based on seven-basic sentiment categories, namely “happy”, “anger”, “sadness”, “fear”, “love”, “hate” and “desire”,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.