The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this project we propose a new approach for emotion recognition using web-based similarity (e.g. confidence, PMI and PMING). We aim to extract basic emotions from short sentences with emotional content (e.g. news titles, tweets, captions), performing a web-based quantitative evaluation of semantic proximity between each word of the analyzed sentence and each emotion of a psychological model (e.g...
LSI and LDA are widely used techniques to uncover the underlying topical structure of text. They traditionally rely on bag-of-words representation of documents and term frequency-based (TF) weighting schemes. In this paper, we represent documents as graph-of-words to capture the relationships between close words and propose the number of contexts of co-occurrences as alternative term weights (TW)...
Measuring the similarity between strings plays an increasingly important role in many applications such as information retrieval, short answer grading, and conversational agent software. There has been much recent research interest in applying string similarity within Arabic language applications; however, the use of string similarity in Arabic poses a substantial challenge such as the complexity...
Relevance is one of the most interesting topics in the information retrieval domain. In this paper, we introduce another method of relevance calculation. We propose to use the implicit opinion of users to calculate relevance. The Implicit judgment of users is injected to the documents by calculating different kinds of weighting. These latter touch several criteria like as user's weight in the query's...
Among the enormous variety of data in recent years, transportation data contain significant potential for understanding the information requirements and intention of passengers. In this paper, we propose a new information ranking method for passenger intention prediction and service recommendation. The method includes three main features, which include (1) predicting the intention of a used based...
In recent years, the fast growth of Web pages and the constant evolution of internet technologies have lead to a significant increase in the number of pedagogical resources. Thus, the indexing and search problems have become crucial. To overcome this problem, it was proposed to use information coming from the norms and standards of educational metadata. However, this solution does not solve completely...
This paper presents a method named SoSVMRank, which integrates the social information of a Web document to generate a high-quality summarization. In order to do that, the summarization was formulated as a learning to rank task, in which the order of a sentence or comment was determined by its informative information. The informative information was measured by a set of local and social features in...
This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between...
This paper presents a method for semantic class disambiguation for all words. Unlike the ordinary word sense disambiguation, a set of semantic classes or coarse grained senses is defined as a common sense inventory, then universal classifiers to select an appropriate semantic class of a target word in a given context, which can be applicable to all words, are trained by supervised learning. In the...
Synonym-based searching is considered to be a complicated problem, as text mining from unstructured data of web is challenging. Finding useful information which matches user need from the bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration...
Traditional Information Retrieval (IR) models are based on bag-of-words paradigm, where relevance scores are computed based on exact matching of keywords. Although these models have already achieved good performance, it has been shown that most of dissatisfaction cases in relevance are due to term mismatch between queries and documents. In this paper, we introduce novel method to compute term frequency...
Although the volume of online educational resources has dramatically increased in recent years, many of these resources are isolated and distributed in diverse websites and databases. This hinders the discovery and overall usage of online educational resources. By using linking between related subsections of online textbooks as a testbed, this paper explores multiple knowledge-based content linking...
In this paper, we present a statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps; we propose, in the first step, the extraction of...
Taking advantage of the large scale corpus on the web to effectively and efficiently mine the topics within texts is an essential problem in the era of big data. We focus on the problem of learning text topic embedding in an unsupervised manner, which enjoys the properties of efficiency and scalability. Text topic embedding represents words and documents in a semantic topic space, in which the words...
Internet inquiry is playing an increasingly important role as the complement of the traditional medical service system, especially the similar cases recommendation. It can not only save the patients' waiting time, but also make use of the historical resources, for many cases with the same purpose have been solved perfectly. However, because of the diversity and non-standard of the patients' descriptions,...
The large volume of information stored in electronical health records is very valuable in the medical field, e.g., for clinical research and administrative purposes. However, health care professionals still face difficulties to recover and select relevant data. Although literature has investigated the influence of lexical, syntactical and semantic parameters in information retrieval techniques, few...
Word Sense Disambiguation (WSD) is the task of automatically choosing the correct meaning of a word in a context. Due to the importance of this task, it is considered as one of the most important and challenging problems in the field of computational linguistics and plays a crucial role in various natural language processing (NLP) applications. In this paper, we present an improved version of a recent...
The Web is a popular, easy and common way to propagate information today and according to the growth of the Web, Web service discovery has become a challenging task. Clustering Web services into similar clusters through calculating the semantic similarity of Web services is one way for overcome this issue. Several methods are used for current similarity calculation process such as knowledge based,...
Existing work in the semantic relatedness literature has already considered various information sources such as WordNet, Wikipedia and Web search engines to identify the semantic relatedness between two words. We will show that existing semantic relatedness measures might not be directly applicable to microblogging content such as tweets due to i) the informality and short length of microblogging...
Document similarity analysis is increasingly critical since roughly 80% of big data is unstructured. Accordingly, semantic couplings (relatedness) have been recognized valuable for capturing the relationships between terms (words or phrases). Existing work focuses more on explicit relatedness, with respective models built. In this paper, we propose a comprehensive semantic similarity measure: Semantic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.