The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Despite the significant contribution from specialized ontologies and text mining methods, the evaluation of the semantic similarity of genes remains difficult because of the complex functions in which genes are involved. A less exploited resource is Wikipedia that stores more than 10400 articles about human genes: each gene name identifies the corresponding Wikipedia page resuming gene's properties...
Wikipedia is an online encyclopedia which contains millions of articles related to different subject domains. Wikipedia also has a search page itself to display the links corresponding to Wikipedia articles for a given user query input. This search result page displays the search results according to the relevance order, without any content based grouping. This paper presents an experimental deduction...
Entity Linking (EL) search and labeling are important research topics with various web applications. The challenge is to find and link the important concepts from web text to online encyclopedia databases instead of simple personal and place names. This paper presents a new approach to link concrete concepts from English texts with Wiki entities. Using part-of-speech tagging to detect concrete concepts,...
Text Mining is a set of techniques that analyzes large masses of data, extract relations that are unknown beforehand, and provide solutions to help decision-making. Text mining had been used extensively to analyze English text. However, text mining has only been used recently in analyzing Arabic text. As a result the objective of this paper is to present the current state of Arabic text mining. A...
Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance...
The time-sensitive nature of the news article implies that the change of extent of internet searches for particular item, as a result of appearance of news, will prevail for few days and then the normal search pattern will again continue to work. This paper presents cloud service to describe how the popularity of the mass media news can be assessed using users online usage behavior. We used data from...
Hierarchical Cluster Labeling helps users to quickly understand and analyze hierarchical clusters. This may be used to enhance search engine results or interactive browsing like it is being used in the Blog Intelligence application. The hierarchical organization of data helps to represent different levels of detail. Hierarchical clustering may be quite common, but there are few good solutions for...
This paper discusses an application of some statistical Natural Language Processing algorithms to a set of articles from Wikipedia about top tourist destinations. The objective is to automatically identify the key features of each destination and then discover other destinations which share similar sets of features. Through this a method is demonstrated by which meta data about each article can be...
This paper aims to lay the foundations of an anaphora resolution framework able to process all types of hypertexts and treat all types of anaphors for the English language. To this end, we provide a linguistically unambiguous and extensive definition and categorization of the concept of anaphora. We introduce a new corpus, and use our proposed categorization to statistically analyze it. Finally, we...
Finding pages on the Web that are similar to a query page is an important component of modern search engines. Especially recognition method of content about Web pages is important role in search engine. However, if Web page include query words, it does not necessarily mean that Web page describe query. The main challenge here is identification factors that affect the relationship between query and...
This paper introduces a new technique to select candidate sentences for alignment from bilingual comparable corpora. Tests were done utilizing Wikipedia as a source for bilingual data. Our test languages are English and Chinese. A high quality of sentence alignment is illustrated by a machine translation application.
Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available. In this work, we propose a universal text classifier, which does not require any labeled document. Our approach simulates the capability of people to classify documents...
In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an...
This paper addresses practical aspects of Web page classification not captured by the classical text mining framework. Classifiers are supposed to perform well on a broad variety of pages. We argue that constructing training corpora is a bottleneck for building such classifiers, and that care has to be taken if the goal is to generalize to previously unseen kinds of pages on the Web. We study techniques...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.