The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper provides an analysis of the main methods for sentence classification in scientific papers and evaluates the feasibility of this technique in unstructured papers in Software Engineering area, in order to automatically find the study results in this area. Tests conducted with the existing methods using unstructured Test Software papers showed results far below those reported by the authors...
Author name disambiguation allows to distinguish between two or more authors sharing the same name. In a previous paper, we have proposed a name disambiguation framework in which for each author name in each article we build a context consisting of classification codes, bibliographic references, co-authors, etc. Then, by pair wise comparison of contexts, we have been grouping contributions likely...
Document classification is critical due to explosive increasing of text in modern world. However, most of existing document classification algorithms are easily affected by noise data. Therefore, in document classification tasks, the ability of noise control is as important as the ability to classify exactly. In this paper, we propose a novel classification framework based on fuzzy formal concept...
A novel solution is proposed to an important problem of learning real querying preferences and intentions from users who need to retrieve interesting information from a database but are not in a position to specify their information needs and/or intentions using a query language due to lack of knowledge and/or experience. A solution is proposed that is based on the presentation to the user of consecutive...
Journalists increasingly turn to social media sources such as Facebook or Twitter to support their coverage of various news events. For large-scale events such as televised debates and speeches, the amount of content on social media can easily become overwhelming, yet still contain information that may aid and augment reporting via individual content items as well as via aggregate information from...
Word Sense Disambiguation (WSD) is main task in the area of natural language processing (NLP). Supervised WSD methods are shown to be more effective than other WSD methods with the limitation of the size of manual annotated learning set. On the other hand, Concept graph is a weighted graph with each of its edges representing the relationships between concepts (relevancy of each pair of concepts)....
The bag-of-visual-words model has been widely used in many applications, such as object recognition, image categorization, and visual information retrieval. However, most existing approaches construct a visual vocabulary by simply clustering image regions represented with low-level visual features, where spatial context of image regions has not been well utilized. In this paper, we present two techniques...
Web pages are conventionally represented by the words found within the contents for classification purpose. However, word-based web page representation suffers several limitations such as synonymy and homonymy. Motivated by the limitations of word-based representation, we explore the potential of representing web pages using information extraction patterns, in addition to words that are identified...
The traditional PageRank (PR) just takes into account the Web link structure, when distributing rank scores it treats all links equally, which results in topic drift. In this paper, latent semantic model (LSM) is used to calculate the similarity between Web pages, and the LSMPageRank (LPR) algorithm is introduced. In this algorithm, the value of parent page is distributed to the child on the basis...
Mobile device can be used as a medium to send and receive the mobile Internet content. However, there are several limitations using mobile Internet. Content personalisation has been viewed as an important area when using mobile Internet. In order for personalisation to be successful, understanding the user is important. In this paper, we explore the implementation of the user profile at client-side,...
Most existing corpus based relation extraction techniques focus on predefined relations. In this paper, a clustering based method is presented for domain relevant relation extraction including both relation type discovery and relation instance extraction. Given two raw corpora, one in the general domain, one in an application domain, domain specific verbs connecting different instances are extracted...
There are a lot of brief semi-structure messages in the Internet. A semantic matching method based on contexts of concepts is proposed. Each word of space vectors representing brief messages is extended with synonyms in a controlled process according to the scores of information gains. Patterns of messages are extracted with naive Bayesian classifier. The similarity between a query and the message...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.