The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Opinion mining is a growing interest task in both research and practical applications. It deals with the computational treatment of opinion, sentiment, and subjectivity in documents. This paper focuses on retrieving the opinion documents and giving their sentiment orientation. Mining and ranking the topic relevant opinion documents are implemented with a sentiment model, combining the existing knowledge...
The performance of statistical machine translation (SMT) system is affected by model parameters (e.g. weights of feature functions), which are usually tuned on a development corpus. Most research done to date has focused on algorithms for tuning parameters. However, the selection of development corpus is lack of discussion. It is believed that the parameters trained on a proper corpus will improve...
For web content extraction task, researchers have proposed many different methods, such as wrapper-based method, DOM tree rule-based method, machine learning-based method and so on. To some extent, all these methods ignore the layout information of the webpage, although the layout information such as the spatial and visual cues often plays a very important role in the process of locating the main...
Toponym Disambiguation (TD) in Geographic Information Retrieval (GIR) systems is a crucial technique, which makes a direct impact on the quality of subsequent assignment of geographic focus to a document and that of establishment of spatial index as well as the effectiveness of the entire retrieval model as a whole. We explore the mechanism for human beings' dealing with the problem of TD. Human's...
Geographical information becomes a kind of very important attribute for web documents, considering the fact that a large proportion of documents on the web contain geographical information. GIR (Geographical information retrieval) systems can identify those geographical information and extract the geographical focus in the documents automatically, hence supporting geo-related queries for information...
This paper studies the web wrapper generation for web pages of forum, blog and news web sites. While more and more web pages are dynamically generated using a common template populated with data from databases. This paper proposes a novel method that uses tree alignment and transfer learning method to generate the wrapper from this kind of web pages. We present a new tree alignment algorithm to find...
For improving the effectiveness of cross-lingual information retrieval (CLIR), a domain ontology knowledge based method is presented to apply to C-E CLIR. In this study, the domain ontology knowledge is acquired from both source language user queries and target documents to select target translation and re-rank initial retrieval documents set. The C-E CLIR dataset from NTCIR-4 Workshop is used to...
For information retrieval, users hope to acquire more relevant information from the top indexing documents. In this paper, a combination of ontology with statistical method is presented to retrieval initial document set and improves the precision of top N ranking documents by re-ranking document set. The experiment with NTCIR-3 Chinese CLIR dataset shows the proposed method improved the precision...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.