The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text categorization is one important task of text mining, for automated classification of large numbers of documents. Many useful supervised learning methods have been introduced to the field of text classification. Among these useful methods, K-Nearest Neighbor (KNN) algorithm is a widely used method and one of the best text classifiers for its simplicity and efficiency. For text categorization,...
In real-world information systems, there are abundant unlabeled data but sparse labeled data. It is challenging to construct an adaptive model to classify a large amount of documents containing different domains. The classifiers trained from a source domain shall perform poorly for the test data in a target domain due to the domain mismatch. In this study, we build a topic-bridged latent Dirichlet...
Automatic document classification due to its various applications in data mining and information technology is one of the important topics in computer science. Classification plays a vital role in many information management and retrieval tasks. Document classification, also known as document categorization, is the process of assigning a document to one or more predefined category labels. Classification...
With an increasing amount of audio and video materials made available on the web, information extraction from multimedia documents is becoming a key area of growing business and technology interest. Research opportunities range from traditional topics, such as multimedia signal representation, processing, coding, modeling, authentication, and recognition, to emerging subjects, such as language modeling,...
Document classification uses different types of word weightings as features for representation of documents. In our findings we find the class document frequency, dfc, of a word is the most important feature in document classification. Machine learning algorithms trained with dfc of words show similar performance in terms of correct classification of test documents when compared to more complicated...
Text categorization or text classification (TC) has recently received increased research attention from information retrieval and machine learning communities, this focus is driven mostly by the ever growing demand for effective and efficient content-based, document management. In the context of digital library or Web portal application, the problem of text categorization is normally that of classification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.