The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Considering today's surge of information, the need for well organized knowledge bases is increasing rapidly for providing simplified access to knowledge and its further processing. In biomedical domain, heaps of information is buried in scientific publications and online forums. This calls for representing this information in a more expressive semantic way by determining and storing relational information...
This paper compares the efficiency of two implementations of the PageRank algorithm, one using Hadoop's MapReduce and the other Giraph's Pregel. The implementations are evaluated against CPU usage, CPU I/O wait time, memory usage, and execution time. The result of the evaluation is that Giraph outperforms Hadoop.
This paper describes the construction of an intelligent semantics-based system that exploits several knowledge bases to tell contextually relevant stories to individuals and groups. Starting from information stored in user profiles, textual queries and pictures, a set of readily available tools recognize topics of interest and features of context, thereupon, we run data mining and semantic reasoning...
In this paper we propose a probabilistic topic model that incorporates DBpedia knowledge into the topic model for tagging Web pages and online documents with topics discovered in them. Our method is based on integration of the DBpedia hierarchical category network with statistical topic models where DBpedia categories are considered as topics. We have conducted extensive experiments on two different...
Spam is an unsolicited message, usually sent in thebulk. It is an unwanted activity that is performed to deceivepeople, to theft their personal information, to inject virus in theirsystem, to redirect them on malicious sites. On OSN, spammersshare malicious link looking like genuine one, place discountmessages on their wall, develop malicious apps and sometimescreate fake accounts. While on blog sites,...
Extraction and integration of entities from textual data and linking them to knowledgebases (for further information or processing) is useful for many applications in natural language processing. However, a major problem in this process is disambiguation, named entities might refer to different things. In this work, we propose a novel method to disambiguate named entities. Our method is a combination...
This paper compares the efficiency of two implementations of the PageRank algorithm, one using Hadoop's MapReduce and the other Giraph's Pregel. The implementations are evaluated against CPU usage, CPU I/O wait time, memory usage, and execution time. The result of the evaluation is that Giraph outperforms Hadoop.
Disadvantages of modern mass communication online tools are discussed in this paper. A new model for online forum is proposed to eliminate these disadvantages. Algorithms for the model are proposed. Methods for implementation of these algorithms are reviewed. The software architecture implementing the proposed model is described in this work.
The World Wide Web has emerged itself to be a huge repository of knowledge. Many websites provide lot of information regarding a topic of interest. In this paper this feature of WWW is made use for the concept of a dynamic encyclopedia. Apart from traditional web search and retrieval this paper deals with the construction of a web encyclopedia page by making use of relevant information from various...
Analysis of digital discourse in social networks can inform decisions about service design and align services with customers' needs and expectations. Using the theory of social representations (SRT) as a theoretical lens, I propose a method for the systematic analysis of the digital discourse in order to identify the core representations on which a service depends. I demonstrate the method by analyzing...
Wikipedia is a central source of information as 450 million people consult the online encyclopaedia every month to satisfy their information needs. Some of these users also refer to Wikipedia within their tweets. In this paper, we analyse links within tweets referring to a Wikipedia of a language different from the tweet's language. Therefore, we investigate causes for the usage of such inter-language...
We apply the notion of “popularity” in machine-generated sentence evaluation to the queries used to search for documents. Our intuition is that queries composed of popular terms obtain more relevant documents and increase the probability that these documents contain the desired results. We measure the popularity of a query by analyzing a massive online document repository, Korean Wikipedia. To verify...
Comprehensibility is an important quality aspect of documents. Incomprehensible documents are of little utility to readers even if they are relevant. However, for many difficult queries such as technical ones, the topically relevant documents tend to be characterized by poor comprehensibility. This makes it difficult for users to satisfy their information needs when searching for documents about difficult...
In this paper we tackle the estimation of apparent age in still face images with deep learning. Our convolutional neural networks (CNNs) use the VGG-16 architecture [13] and are pretrained on ImageNet for image classification. In addition, due to the limited number of apparent age annotated images, we explore the benefit of finetuning over crawled Internet face images with available age. We crawled...
Machine-learning state-of-the-art keyphrase extraction systems do not take into consideration the fact that part of these keyphrases may not be found in the text. Therefore these systems typically use a training set restricted to textual terms, reducing the learning capabilities of any inductive algorithm. Our research investigates ways to improve the accuracy of these systems by allowing classification...
The extraction of semantic contexts is a relevant issue in information retrieval to provide high quality query results. This paper introduces the semantic context underlying a set of given input concepts as defined by the relevant multiple explanation paths connecting the input concepts in a collaborative network. A pheromone-like model based on this approach is introduced for the detection and the...
Most of the current text understanding techniques are based on ontology engine and external knowledge resources to reach to a deep comprehension. In this paper, we propose a computerized text comprehension technique for a given text. This technique can accommodate a deep text comprehension by an iterative reading of reference texts related to the given text using ontology engine. Performance analysis...
Nowadays, an ever increasing number of news articles is published on a daily basis. Especially after notable national and international events or disasters, news coverage rises tremendously. Temporal summarization is an approach to automatically summarize such information in a timely manner. Summaries are created incrementally with progressing time, as soon as new information is available. Given a...
With the growth of Linked Data, updating knowledge bases (KB) is becoming a crucial problem, particularly when representing the knowledge linked to permanently evolving instances. Many approaches have been proposed to extract new knowledge from textual documents in order to update existing KB. These approaches reach maturity but rely on the fact that the adequate corpus is already constructed. In...
Enormous efforts of human volunteers have made Wikipedia become a treasure of textual knowledge. Relation extraction that aims at extracting structured knowledge in the unstructured texts in Wikipedia is an appealing but quite challenging problem because it's hard for machines to understand plain texts. Existing methods are not effective enough because they understand relation types in textual level...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.