The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Relation extraction is a challenging task in biomedical text mining due to the complex of sentences in the biomedical literature. In this paper, we address multi-class relationship extraction problem from biomedical literature using Maximum Entropy model with simple word features. The proposed method is applied to extract the protein-protein interactions. Experiments show the method achieves an accuracy...
Document genre information is one of the most distinguishing features in information retrieval, which brings order to the search results. What the genre classification concerned is not the topic but the genre of document. In this paper, two different feature sets were employed: bag of words which are derived by feature selection method and structural features which are selected manually and subjectively...
Classical text clustering algorithms are usually based on vector space model or its variants. Because of the high computing complexity and the difficulty of controlling clustering results, this kind of approaches are hard to be applied for the purpose of the large scale text clustering. Clustering algorithms based on frequent term sets make use of relationship among documents and their shared frequent...
Unknown word recognition is a very important problem in natural language processing. It has a great influence on the performance of dictionary construction and word segmentation. This paper introduces two methods to improve the effect of Chinese unknown word recognition by using Conditional Random Fields: the rough label of the characters and the N-best listing. The CRF with the two methods proposed...
Document genre information is one of the most distinguishing features in information retrieval, which brings order to the search results. What the genre classification concerned is not the topic but the genre of document. In this paper, we examine the effectiveness of using machine learning techniques to solve genre classification of Chinese text with the same topic, viz. finance. Based on the likelihood...
Web page content extraction can be achieved by node-based and segmentation-based algorithms respectively on top of the document object model (DOM). However, the node-based algorithm often removes content embedded as anchor text; while the segmentation-based way can not distinguish irrelevant text from content text when they are divided into the same segment. The two kinds of algorithms don't keep...
Text clustering techniques were usually used to structure the text documents into topic related groups which can facilitate users to get a comprehensive understanding on corpus or results from information retrieval system. Most of existing text clustering algorithm which derived from traditional formatted data clustering heavily rely on term analysis methods and adopted vector space model (VSM) as...
A sentence-based Chinese text input method system is proposed in this paper, which is implemented on both Symbian S60 and Windows Mobile platform with such characters as easy-to-use, efficient and smart. The whole system is compacted within 150 k, and can be integrated with cell phone, PDA and remoter.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.