Search results for: Xiaolong Wang

Items from 1 to 12 out of 12 results

chapter

A Block Segmentation Based Approach for Web Information Extraction

Chanwei Wang, Chengjie Sun, Lei Lin, Xiaolong Wang

2010 International Conference on Asian Language Processing > 154 - 157

2010 International Conference on Asian Language Processing (IALP 2010)

This paper addresses the issue of web information extraction to support automatic teacher information management. We propose an effective approach based on block segmentation. First, the teacher introduction web pages are divided into independent blocks, where html tags and punctuation marks are used as segmentation criterion. Then CRF model is employed to label the text. We apply this approach on...

chapter

A study of features on Primary Question detection in Chinese online forums

Lin Sun, Bingquan Liu, Baoxun Wang, Deyuan Zhang, more

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2422 - 2427

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Primary Question detection in online forum is a subtask of extracting question-answer pairs. In this paper, by surveying the forms of questions in Chinese online forums, a combination of textual and N-gram features achieved via feature selection is adopted to help detecting primary questions. By viewing primary question detection a binary classification problem, decision tree classifier C4.5 and support...

chapter

A comparative study of topic models for topic clustering of Chinese web news

Yonghui Wu, Yuxin Ding, Xiaolong Wang, Jun Xu

2010 3rd International Conference on Computer Science and Information Technology > 5 > 236 - 240

2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT 2010)

Topic model is an increasing useful tool to analyze the semantic level meanings and capture the topical features. However, there is few research about the comparative study of the topic models. In this paper, we describe our comparative study of three topic models in the extrinsic application of topic clustering. The topic model distance is defined on the converged parameters of topic models, which...

chapter

Foxinfo1.0: A Chinese Topic-Oriented Search Engine

Ke Sun, Lei Lin, Bingquan Liu, Chengjie Sun, more

2009 International Conference on Asian Language Processing > 91 - 96

2009 International Conference on Asian Language Processing (IALP 2009)

Topic-oriented search engine (topic-search) is a new IR service which provides compounded types of information with certain user queried topic in one page. It firstly categorizes user query into a certain domain, and then organizes several types of information based on the query keywords into a magazine-style topic page for user. In this paper, we propose a Chinese topic-oriented search engine service,...

chapter

An Ontology-Based NLP Approach to Semantic Annotation of Annual Report

Baohua Wang, Hejiang Huang, Xiaolong Wang, Wensheng Chen

2009 International Conference on Computational Intelligence and Security > 1 > 180 - 183

2009 International Conference on Computational Intelligence and Security (CIS 2009)

Annual reports of Chinese securities companies have become the most significant and reliable source of information for domestic and foreign investors. Semantic annotation of them enhanced information retrieval and improved interoperability. In this paper we first review the major features of annual reports which are tagged PDF format, then propose a novel ontology-based NLP approach to semantic annotate...

chapter

Extracting Event Temporal Information Based on Web

Bo Yuan, Qingcai Chen, Xiaolong Wang, Liwei Han

2009 Second International Symposium on Knowledge Acquisition and Modeling > 1 > 346 - 350

2009 Second International Symposium on Knowledge Acquisition and Modeling (KAM 2009)

Temporal information is an important characteristic of event. It can be used in information retrieval process to organize the returned result. In Chinese, the presentations of time expression are very complex, which make it difficult to both accurately recognize a time expression and precisely connecting it with a given event in a Web page that contains multiple events. To address these problems,...

chapter

Study on feature selection in finance text categorization

Changqiu Sun, Xiaolong Wang, Jun Xu

2009 IEEE International Conference on Systems, Man and Cybernetics > 5077 - 5082

2009 IEEE International Conference on Systems, Man and Cybernetics. SMC 2009

Document genre information is one of the most distinguishing features in information retrieval, which brings order to the search results. What the genre classification concerned is not the topic but the genre of document. In this paper, two different feature sets were employed: bag of words which are derived by feature selection method and structural features which are selected manually and subjectively...

chapter

Extracting Chinese question-answer pairs from online forums

Baoxun Wang, Bingquan Liu, Chengjie Sun, Xiaolong Wang, more

2009 IEEE International Conference on Systems, Man and Cybernetics > 1159 - 1164

2009 IEEE International Conference on Systems, Man and Cybernetics. SMC 2009

Extracting question-answer pairs from online forums is a meaningful work due to the huge amount of valuable user generated resource contained in forums. In this paper we consider the problem of extracting Chinese question-answer pairs for the first time. We present a strategy to detect Chinese questions and their answers. We propose a sequential rule based method to find questions in a forum thread,...

chapter

CRF-based active learning for Chinese named entity recognition

Lin Yao, Chengjie Sun, Shaofeng Li, Xiaolong Wang, more

2009 IEEE International Conference on Systems, Man and Cybernetics > 1557 - 1561

2009 IEEE International Conference on Systems, Man and Cybernetics. SMC 2009

Conditional random fields (CRFs) have been used for many sequence labeling tasks and got excellent results. Further, the supervised model strongly depends on the huge training data. Active learning is a different way rather than relying on a large amount random sampling. However, random sampling constructively participates in the optimal choosing training examples. Based on different query strategies,...

chapter

Basic semantic units based web page content extraction

Jingqi Wang, Qingcai Chen, Xiaolong Wang, Hongzhi Guo

2008 IEEE International Conference on Systems, Man and Cybernetics > 1489 - 1494

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Web page content extraction can be achieved by node-based and segmentation-based algorithms respectively on top of the document object model (DOM). However, the node-based algorithm often removes content embedded as anchor text; while the segmentation-based way can not distinguish irrelevant text from content text when they are divided into the same segment. The two kinds of algorithms don't keep...

chapter

Semantic feature reduction in chinese document clustering

Xianjun Meng, Qingcai Chen, Xiaolong Wang

2008 IEEE International Conference on Systems, Man and Cybernetics > 3721 - 3726

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Text clustering techniques were usually used to structure the text documents into topic related groups which can facilitate users to get a comprehensive understanding on corpus or results from information retrieval system. Most of existing text clustering algorithm which derived from traditional formatted data clustering heavily rely on term analysis methods and adopted vector space model (VSM) as...

article

A New Measurement of Systematic Similarity

Yi Guan, Xiaolong Wang, Qiang Wang

IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and... > 2008 > 38 > 4 > 743 - 758

The relationship of similarity may be the most universal relationship that exists between every two objects in either the material world or the mental world. Although similarity modeling has been the focus of cognitive science for decades, many theoretical and realistic issues are still under controversy. In this paper, a new theoretical framework that conforms to the nature of similarity and incorporates...

Filter options

Keywords:
INFORMATION RETRIEVAL

Publication date

Set your own date range

INFONA - science communication portal

Search results for: Xiaolong Wang

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options