The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we present a new mathematical model based on a “Vector Space Model” and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic...
In this paper, we propose a framework to answer questions of opinion type. The data source is the web pages returned from the search engine. By using Bayes Classifier, the main texts on the pages are classified into three categories at sentence level: positive review, negative review and neutral review. K-means method is used to cluster the sentences of positive review and negative review respectively...
With the development of modern social technology and people's constant pursuit for knowledge, a more diverse and open cyber space is becoming the fact in academic field. In this environment, it is more necessary than ever before that people extract comprehensive, efficient, and valuable information from a wide range of cultural information. Therefore, text categorization research has become more important...
A new method to compute Chinese text concept is proposed in this paper. In this method, we construct sentence vectors from the text by extracting and quantifying some syntax and semantic features such as concept elements, dependent relations and correlative relations. Then, we combine these sentence vectors to the text vector to represent the text concept. Experimental results show that, in the application...
Text classification is the key technology for topic tracking, and vector space model (VSM) is one of the most simple and effective model for topics representation. Feature selection algorithm in VSM is an important means of data pre-processing, and it can reduce vector space dimension and improve the generalization ability of the algorithm. Therefore, it is necessary for feature selection algorithms...
Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight...
It is important to reduce the dimensionality of features in Web Chinese text categorization. Isomap algorithm is an unsupervised manifold learning technique. SIIsomap algorithm, an extension of Isomap to supervised feature extraction, is proposed in this paper. It uses adding constant method and a direct embedding technique of Isomap algorithm for testing data to make the embedding more reasonable...
Style-based text authorship identification extracts features from authorship-known texts, constructs classifier and then identifies disputed texts. Authorship identification belongs to the domain of style classification and is a branch of text classification. In contrast with text classification which deals with the content of texts, authorship identification focuses on the form property of texts...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.