The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this information era, the number of websites in the Internet has dramatically increased over a few years. Any information and services can be retrieved from the website. However, the most valuable content of the website is still a text which is related to the topic or category of the websites. But there has only few researches focusing on categorizing Thai language information. The rest of researches...
Supervised learning methods are widely used in text sentiment classification. To acquire high classification performance, the effective and precise term weighting scheme plays a prime and necessary role for classification system. The traditional term weighting schemes often ignore the use of the available labeling information as the prior knowledge, which results the expressed relationships between...
This paper discusses the identification of extend relation in scientific papers based on supervised machine learning. Identification of extend relations is conducted by classifying each sentence in scientific papers into extend category. Extend relation is one type of papers' relations that obtained by using the citation context based approach. Citation context is a set of words or phrases in a sentence...
The significant growth of online textual information has increased the demand for effective content-based Arabic text categorization methods. The categorization of Arabic texts has some challenges that need to be addressed specially when using stemming. In the literature, we found a debate among researchers about the benefits of using stemming in Arabic text categorization. Hence, we performed a study...
Because of the ubiquity of metaphors in language, metaphor processing is a very important task in the field of natural language processing. The first step towards metaphor processing, and probably the most difficult one, is metaphor detection. In the first part of this paper, we review the theoretical background for metaphors and the models and implementations that have been proposed for their detection...
Supervised learning is a popular approach to text classification among the research community as well as within software development industry. It enables intelligent systems to solve various text analysis problems such as document organization, spam detection and report scoring. However, the extremely difficult and time intensive process of creating a training corpus makes it inapplicable to many...
Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is an important preprocessing technique in text categorization. In this paper, we purpose Thai Document Categorization Framework focusing on the comparison of various term weighting schemes, including Boolean, tf, tf-idf,...
In multi-instance multi-label learning (i.e. MIML), each example is not only represented by multiple instances but also associated with multiple labels. Most existing algorithms solve MIML problem via the intuitive way of identifying its equivalence in degenerated version of MIML. However, this identification process may lose useful information encoded in training examples and therefore be harmful...
Clustering, an supervised learning process is a challenging problem. Clustering result quality improves the overall structure. In this article, we propose an incremental stream of hierarchical clustering and improve the efficiency, reduce time consumption and accuracy of text categorization algorithm by forming an exact sub clustering. In this paper we propose a new method called multilevel clustering...
Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different...
Human often wants to listen to music that fits best his current emotion. A grasp of emotions in songs might be a great help for us to effectively discover music. In this paper, we aimed at automatically classifying moods of songs based on lyrics and metadata, and proposed several methods for supervised learning of classifiers. In future, we plan to use automatically identified moods of songs as metadata...
In order to solve the problem of high dimension in text classification, this paper imported local linear embedding algorithm for dimension reduction. However, the original LLE did not necessarily make the loss of information minimize in process of reduction, so we combinated its two loss function together and improved it firstly. Then, linked the improved LLE and supervised learning and support vector...
Text categorization is an important research field within text mining. The initial objective of text categorization is to recognize, understand and organize various volumes of texts or documents. The general procedures of categorization are treated as supervised learning, from which the similarity can be inferred from a collection of categorized texts for training purpose. Obviously, the typical approaches...
Traditional text learning algorithms need labeled documents to supervise the learning process, but labeling documents of a specific class is often expensive and time consuming. We observe it is convenient to use some keywords(i.e. class-descriptions) to describe class sometimes. However, short class-description usually does not contain enough information to guide classification. Fortunately, large...
Multi-instance multi-label learning (MIML) deals with the problem where each training example is associated with not only multiple instances but also multiple class labels. Previous MIML algorithms work by identifying its equivalence in degenerated versions of multi-instance multi-label learning. However, useful information encoded in training examples may get lost during the identification process...
Traditional text classification methods make a basic assumption: the training and test set are homologous, while this naive assumption may not hold in the real world, especially in the Web environment. Documents on the Web change from time to time, pre-trained model may be out of date when applied to new emerging documents. However some information of training set is nonetheless useful. In this paper...
In this paper we present two algorithms for shape recognition. Both algorithms map the contour of the shape to be recognized into a string of symbols. The first algorithm is based on supervised learning using string kernels as often used for text categorization and classification. The second algorithm is very weakly supervised and is based on the procrustes analysis and on the edit distance used for...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.