The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Identification of Chinese coding type is a major and challenging issue in Chinese web content audit and analysis. In this paper we develop a novel algorithm based on the theory of Kolmogorov complexity to identify the coding type of Chinese characters of a given text segment. An array of text compressors are used as filters to evaluate the information distance of text under examination and the training...
In view of ignoring semantic relationship between words, high dimensionality of data and computational complexity when current text clustering algorithms deal with Chinese texts. This paper presents a new method to cluster Chinese texts based on semantics in a specific field-TCBS (Text Clustering Based on Semantics) algorithm. The algorithm is based on the agglomerative hierarchical clustering algorithm,...
Classical text clustering algorithms are usually based on vector space model or its variants. Because of the high computing complexity and the difficulty of controlling clustering results, this kind of approaches are hard to be applied for the purpose of the large scale text clustering. Clustering algorithms based on frequent term sets make use of relationship among documents and their shared frequent...
In this paper, an efficient text classification algorithm for repeating-text information on the e-commerce site can automatically classify and sort the similar string. This algorithm will greatly increase the efficiency and accuracy of audited information. All tests show that for the number of information between 100 and 1000 the algorithm is very efficient, and the 1000 text information(strings)...
Libraries and museums are digitizing their collections of historical culture objects to enable public access, such as historical Chinese calligraphy. These collections are only available in image format, lacking practical technology to offer the basic search service for public access. This paper proposes a quick search approach by a coarse-to-fine strategy. First, long list of calligraphy characters...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.