The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Web text classification is the process of determine the text types automatically under a given classification, according to the text content. Web text categorization system is the use of machine learning, knowledge engineering and other related fields of knowledge, access to the web on the text, after text preprocessing, Chinese word segmentation and training classifier, using classification algorithm...
In e-commerce transactions, goods are classified according to the hierarchical structure, which refers to a tree category. In the process of classification, we shall consider the special features. While using brand name for category, for instance, the degree of distinction characteristic of brand is higher. Based on this, we prepare a dictionary of brands for Chinese words segamentatin on one hand...
The close visual relation between the style of typographic words in a document from one side and the conceptual meaning of `texture' from the other side, has been used to propose an approach based on gabor filter extracted features to classify words in a document into three classes of regular, italic and bold. Since the generalized dirichlet distribution (GDD) is shown to be very flexible in image...
Automatic content generation aims on developing an intelligent tutoring system in Tamil language. This system focuses on delivering personalized content in Tamil language to an individual user needs based on their learning abilities and interests. This paper deals with automatic classification of Tamil documents and also the information extraction from those documents to construct the knowledge base...
This paper presents a corpus-based approach for extracting keywords from a text written in a language that has no word boundary. Based on the concept of Thai character cluster, a Thai running text is preliminarily segmented into a sequence of inseparable units, called TCCs. To enable the handling of a large-scaled text, a sorted sistring (or suffix array) is applied to calculate a number of statistics...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.