The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a corpus-based approach for extracting keywords from a text written in a language that has no word boundary. Based on the concept of Thai character cluster, a Thai running text is preliminarily segmented into a sequence of inseparable units, called TCCs. To enable the handling of a large-scaled
To reduce the human effort in labeling the training set for document classification, some learning algorithms ask users to give the representative keywords for each class rather than any labeled documents. The key challenge in such \emph {keyword-labeled classification} is how to learn the high quality classifier with
Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering
. A third technique involves extraction of keywords and storing them in a properly indexed base. These then can serve the dual purpose of providing solutions to Lazy Learning classification for automatic subject-wise archiving and formation of relevant word sequences for detection of plagiarism using Association Rule
Social tagging allows users to assign keywords (tags) to resources facilitating their future access by the tag creator, and possibly by other users. In terms of its support for resource discovery, social tagging has both proponents and critics. The goal of this paper investigates if tags are an effective means for
cannot provide sufficiently accurate results because of a "lack of words" problem. From this observation, items in the same category always have the same group of terms (or keywords) and the similar locations of these terms in phrases suggest that the items have a high probability to be in the same category. Our new
subjectivity of deciding relevant documents empirically. Furthermore, a sentence selection strategy through extracting keywords is proposed. It calculated the word's query related feature through word co-occurrence window, and obtained the topic related feature through likelihood ratio, then combined the two features to extract
This paper is to introduce a new approach to build topic digital library using concept extraction and document clustering. Firstly, documents in a special domain are automatically produced by document classification approach. Then, the keywords of each document are extracted using the machine learning approach. The
attribute labels to them. It can greatly boost the efficiency of text processing. For building up two views, we split features into two parts, each of which can form an independent view. One view is made up of the feature set of abstract, and the other is made up of the feature sets of title, keywords, creator and department
learning approach. We use a graphical model, Dynamic Conditional Random Fields (DCRFs), for training our classifier. Our approach is based on semantic analysis of text to classify the predicates describing coexpression relationship rather than detecting the presence of keywords. We compared our results of sentence
index texts. Traditional BOW matrix is replaced by ldquoBag of Conceptsrdquo (BOC). For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support vector machine a successful machine learning technique is used for classification. Experimental results shows that
Traditional text learning algorithms need labeled documents to supervise the learning process, but labeling documents of a specific class is often expensive and time consuming. We observe it is convenient to use some keywords(i.e. class-descriptions) to describe class sometimes. However, short class-description
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.