The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Aiming at multi-classification and multi-label in text categorization, an apery algorithm is proposed which judges whether document has multi-classification and multi-label by estimating the similarity difference among final classifier values. If the quotient of the biggest category's classifier value divided by the second biggest category's classifier value is less than or equal to a threshold, the...
The traditional weighting schemes used in text categorization for the vector space model (VSM) cannot exploit information intrinsic to texts obtained through online handwriting recognition or any OCR process. Especially, top n (n > 1) recognition candidates could not be used without flooding the resulting text with false occurrences of spurious terms. In this paper, an improved weighting scheme...
Clustering techniques have been used by many intelligent software agents in order to retrieve, filter, and categorize documents available on the World Wide Web. Clustering is also useful in extracting salient features of related Web documents to automatically formulate queries and search for other similar documents on the Web. Traditional clustering algorithms either use a priori knowledge of document...
How to use the incremental training corpus to improve the question classification accuracy rate in the process of question classification based on statistic learning. A question classification method based on the incremental modified Bayes was presented in this paper. The method used the modified Bayes and combined the incremental learning to correct the parameter by the incremental training set stage...
Most of the Chinese text classification systems are all based on the technology of bag of words (BW) which is a valid probability tool for text representation and can provide a better semantic architecture. But the weakness in classification accuracy is still unconquerable. Support vector machine (SVM) has become a popular classification tool and can be applied in the scheme, but the main disadvantages...
Considering the statistical text classification problem we approximate class-conditional probability distributions by structurally modified Poisson mixtures. By introducing the structural model we can use different subsets of input variables to evaluate conditional probabilities of different classes in the Bayes formula. The method is applicable to document vectors of arbitrary dimension without any...
This paper is to introduce a novel semi-supervised learning algorithm named linear neighborhood spread (LNS), which is capable for learning manifold structures. Labeled and unlabeled data are represented as vertices in a weighted graph, and each data point is assumed can be linearly constructed from its neighborhood. Labels are spread through the edges, and the weighted graph is regarded as probabilistic...
Automatic text summarization is to compress an original document into an abridged version by extracting almost all of the essential concepts with text mining techniques. This research focuses on developing a hybrid automatic text summarization approach, KCS, to enhancing the quality of summaries. KCS employs the K-mixture probabilistic model to establish term weights in a statistical sense, and further...
Most conventional incremental learning algorithms perform incremental learning by selecting only one optimized text sample each time, which neither considers the relationship between texts in the unlabeled text set, nor improves incremental learning efficiency. In addition, because of the shortage of the classifierpsilas information storage, the selected optimized text is easily classified incorrectly...
The role of text categorization algorithms is to deal with the ever increasing amount of documents either online or offline. Its capability to organize numerous documents into pre-defined categories significantly increases the efficiency and decreases human resources. Recently, support vector machine (SVM) gained popularity due to its excellent generalization ability and fast training speed on large...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.