The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper addresses the issue of web information extraction to support automatic teacher information management. We propose an effective approach based on block segmentation. First, the teacher introduction web pages are divided into independent blocks, where html tags and punctuation marks are used as segmentation criterion. Then CRF model is employed to label the text. We apply this approach on...
Temporal information is an important characteristic of event. It can be used in information retrieval process to organize the returned result. In Chinese, the presentations of time expression are very complex, which make it difficult to both accurately recognize a time expression and precisely connecting it with a given event in a Web page that contains multiple events. To address these problems,...
Most of the researches on Web information processing are concentrated on the Web pages and the hyperlinks among them. One of the important facts that a Web page is just one building block of the whole Website had been ignored. But the situation is gradually changed in recent years for the needs of Website reputation calculation, the high level Website structure mining etc. It causes the Website ranking...
Web page content extraction can be achieved by node-based and segmentation-based algorithms respectively on top of the document object model (DOM). However, the node-based algorithm often removes content embedded as anchor text; while the segmentation-based way can not distinguish irrelevant text from content text when they are divided into the same segment. The two kinds of algorithms don't keep...
Text clustering techniques were usually used to structure the text documents into topic related groups which can facilitate users to get a comprehensive understanding on corpus or results from information retrieval system. Most of existing text clustering algorithm which derived from traditional formatted data clustering heavily rely on term analysis methods and adopted vector space model (VSM) as...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.