The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A prospective buyer interested in a particular item may find out information about the item from various sources, including product reviews. With interactive information sharing facilitated by Web 2.0, a lot of product reviews are available on the web. For a popular item with a large number of reviews, a prospective buyer could use some help in selecting only reviews of interest, such as, only positive...
This paper proposes a novel multiclass classification method and exhibits its advantage in the domain of text categorization with a large label space and, most importantly, when some of the labels were not observed in the training data. The key insight is the introduction of intermediate aspect variables that encode properties of the labels. Aspect variables serve as a joint representation for observed...
Question classification plays a crucial important role in the question answering system. Recent research on question classification for open-domain mostly concentrates on using machine learning methods to resolve the special kind of text classification. This paper presents our research about Chinese question classification using machine learning method and gives our approach based on SVM and semantic...
Orientation detection is an important preprocessing step for accurate recognition of text from document images. Many existing orientation detection techniques are based on the fact that in Roman script text ascenders occur more likely than descenders, but this approach is not applicable to document of other scripts like Urdu, Arabic, etc. In this paper, we propose a discriminative learning approach...
In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document's metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document...
With the development and widely used of Internet and information technology, the Web has become one of the most important means to obtain information for people. According to the Web document classification and the theory of artificial neural network, a Web classification mining method based on classify support vector machine (SVM) is presented in this paper. The SVM network structure that used for...
Automatically classifying text documents is an important field in machine learning. Unsupervised text classification does not need training data but is often criticized to cluster blindly. Supervised text classification needs large quantities of labeled training data to achieve high accuracy. However, in practice, labeled samples are often difficult, expensive or time consuming to obtain. In the meanwhile,...
There are various opinions on the Web, and analyzing them is an important task. Although many previous studies focused on analyzing subjective evaluative expressions, objective evaluative expressions which describe positive or negative facts are also informative information. In this paper, we study extraction and classification of subjective and objective evaluative expressions on Japanese Web documents...
In this paper, we propose the "democratic classifier", a simple pattern-based classification algorithm that uses very short patterns for classification, and does not rely on the minimum support threshold. Borrowing ideas from democracy, our training phase allows each training instance to vote for an equal number of candidate size-2 patterns. The training instances select patterns by effectively...
This paper presents a new method to identify languages. A LVQ (learning vector quantization) network aimed at language identification is introduced. The presence of particular characters, words and the statistical information of word lengths are used as a feature vector. The new classification technique is faster than the conventional N-gram based classification approach, but it performs similarly...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.