The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
To reduce the human effort in labeling the training set for document classification, some learning algorithms ask users to give the representative keywords for each class rather than any labeled documents. The key challenge in such \emph {keyword-labeled classification} is how to learn the high quality classifier with
In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords across a training text corpus. We have shown experimentally, the flexibility of this approach in classifying keywords into different domains based on
easy to bring the problem of topic excursion. Hits algorithm requires a number of pages as the basic-set for calculating and cannot be used in plain texts. This paper introduces a new algorithm: PK-TDC which makes use of the iterative idea of Hits. PK-TDC searches the authority pages and keywords on the topology of pages
Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when
can be expected to be achieved in a QA system. Sentences are classified according to the content. Each classification is classified into a more detailed field. Important keywords are extracted from the sentences classified into the field. Moreover, the extracted keywords are classified into common and peculiar word for
Social tagging allows users to assign keywords (tags) to resources facilitating their future access by the tag creator, and possibly by other users. In terms of its support for resource discovery, social tagging has both proponents and critics. The goal of this paper investigates if tags are an effective means for
The purpose of this research is to propose an appropriate classification approach to improving the effectiveness of spam filtering on the issue of skewed class distributions. A clustering-based classifier is proposed to first cluster documents into several groups, and then an equal number of keywords are extracted
events. And a huge resource of text-based emotion can be found from the World Wide Web nowadays. This paper reports a study to investigate the effectiveness of using SVM (Support Vector Machine) on linguistic features considering emotion keywords and negative words, and classify a collection of blog posts sentences tagged
This paper introduces a method of constructing a semantic dictionary automatically from the keywords and classify relations of the web encyclopedia Chinese WikiPedia. Semantic units, which are affixes (core/modifier) shared between many phrased-keywords, are selected using statistic method and string affix matching
Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords
In text categorization, vectorizing a document by probability distribution is an effective dimension reduction way to save training time. However, the data sets that share many common keywords between categories affect the classification performance seriously. To address that problem, firstly, we conduct an effective
of the classifier. Our experimental results shows that these measures can improve the classifier's performances, for keywords change too rapidly in emails while address groups are much steadier.
likelihood in the entire training documents where the training and test data are split randomly into k-subsets like 2/3 for training and 1/3 for test data. In addition, it also utilizes two level hierarchy structures for training documents like features from title, keywords and content with the predefined knowledge available
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.