The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a novel multiclass classification method and exhibits its advantage in the domain of text categorization with a large label space and, most importantly, when some of the labels were not observed in the training data. The key insight is the introduction of intermediate aspect variables that encode properties of the labels. Aspect variables serve as a joint representation for observed...
Text classification is one of the core applications in data mining due to the huge amount of not categorized digital data available. Training a text classifier generates a model that reflects the characteristics of the domain. However, if no training data is available, labeled data from a related but different domain might be exploited to perform cross-domain classification. In our work, we aim to...
Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended...
The text representation in text classification is usually a sequence of terms. As the number of terms becomes very high, it is greatly time-consuming to perform existed text categorization tasks. In this paper we presented a novel text representation model for text classification which greatly reduced the required resources. This model represents text with several features. Each feature corresponds...
This paper presents a new method to identify languages. A LVQ (learning vector quantization) network aimed at language identification is introduced. The presence of particular characters, words and the statistical information of word lengths are used as a feature vector. The new classification technique is faster than the conventional N-gram based classification approach, but it performs similarly...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.