The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three...
On large scale dataset, the effect of automatic text classification is now still far from perfect. It's a common agreement that more sufficient text semantic meaning be adopted in text representation to deal with the challenge. This paper introduces semantic meaning of coreference in and to improve traditional BOW representation. The result of text classification experiment shows that, contrasted...
Text classification plays an important role in information extraction and summarization, text retrieval, and question-answering. The discriminative multinomial naive Bayes classifier has been a focus of research in the field of text classification. This paper increases the accuracy of discriminative multinomial Bayesian classifier with the usage of the feature selection technique that evaluates the...
In multi-instance learning, the training set comprises labeled bags which are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this paper, a text mining problem, i.e. text representation, is investigated from a multi-instance view. In detail, each text is regarded as a bag while each of its sentences is regarded as an instance. Bag can be labeled by its class...
Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, current text classification systems are based on the ldquoBag of Wordsrdquo (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich...
Text representation, which is a fundamental and necessary process for text-based intelligent information processing, includes the tasks of determining the index terms for documents and producing the numeric vectors corresponding to the documents. In this paper, multi-word, which is regarded as containing more contextual semantics than individual word and possessing the favorable statistical characteristics,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.