The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We evaluate the suitability of latent and explicit semantic spaces of documents for Information Retrieval (IR) tasks using a dataset obtained from the Q&A community Stackexchange. In addition, the ability of the latent semantic spaces to reconstruct human relevance judgments is explored. The latent semantic spaces are generated with Latent Dirichlet Allocation (LDA), while explicit semantic spaces...
This paper introduces the problem of topical sequence profiling. Given a sequence of text collections such as the annual proceedings of a conference, the topical sequence profile is the most diverse explicit topic embedding for that text collection sequence that is both representative and minimal. Topic embeddings represent a text collection sequence as numerical topic vectors by storing the relevance...
Literature recommender systems support users in filtering the vast and increasing number of documents in digital libraries and on the Web. For academic literature, research has proven the ability of citation-based document similarity measures, such as Co-Citation (CoCit), or Co-Citation Proximity Analysis (CPA) to improve recommendation quality. In this paper, we report on the first large-scale investigation...
General purpose Search Engines (SEs) crawl all domains (e.g., Sports, News, Entertainment) of the Web, but sometimes the informational need of a query is restricted to a particular domain (e.g., Medical). We leverage the work of SEs as part of our effort to route domain specific queries to local Digital Libraries (DLs). SEs are often used even if they are not the “best” source for certain types of...
In current education, it is difficult for a teacher to know the engagement of each student, the contents that students cannot understand and the reason why students cannot perform sufficiently in the quizzes and exams. To study student engagement in classroom, we digitize materials used in lectures, including textbooks and collect event logs of tablets used by students. By analyzing these logs, we...
One of the most crucial problems in any Natural Language Processing (NLP) task is the representation of time. This includes applications such as Information Retrieval techniques (IR), Information Extraction (IE) and Question/answering systems (QA). This paper deals with temporal information involving several forms of inference in Arabic language.
Information Extraction is an important task in Natural Language Processing research. Named Entity Recognition as one of the basic tasks of information extraction, the effect has a great impact on the subsequent tasks such as Relation Extraction. And a major difficulty of NER lies in the unknown word identification. For this issue, method of exploiting Wikipedia external information methods was studied...
Young users, particularly students, can accessvarious sources of information nowadays. However, theynormally lack of experiences to assess the information credibility. This study is aimed at exploring students' choices of informationsources, gathering their points of view about the informationcharacteristics and information verification approaches, andcomparing these perceptions/ intention to their...
Quantifying the semantic relation between words is a key element in several applications including the treatments at the meaning level. A great variety of approaches are proposed in order to quantify the semantic proximity between concepts or words. These approaches exploit computational models including the hierarchical and textual information of the semantic resources. Among these models, the distributional...
Trolling describes a range of antisocial online behaviors that aim at disrupting the normal operation of online social networks and media. Combating trolling is an important problem in the online world. Existing approaches rely on human-based or automatic mechanisms for identifying trolls and troll posts. In this paper we take a novel approach to the trolling problem: our goal is to identify the targets...
Semantic relation plays an important role in knowledge acquisition research. This paper proposes a method of semantic relation acquisition and automatic synthesis based on Wikipedia. First of all, we obtain the three kinds of basic semantic relations from Wikipedia and extend the semantic of concept aiming at the problem of semantic fuzziness in the semantic relation. Then, an automatic synthesis...
Hyponymy is one of the most critical semantic relations, which contributes magnificently to semantic dictionary, information retrieval etc. In this paper, a method of extracting hyponymy is proposed based on multiple data sources fusion, which convert the extraction of hyponymy to the extraction of hypernyms for target words. First, mining candidate hypernyms for the target words based on search engine,...
Question Answering (QA) system is the task where arbitrary question IS posed in the form of natural language statements and a brief and concise text returned as an answer. Contrary to search engines where a long list of relevant documents returned as a result of a query, QA system aims at providing the direct answer or passage containing the answer. We propose a general purpose question answering...
This paper proposes an approach to finding answers within single text for a given question through extracting a network of categories from Wikipedia as background knowledge to support matching between question and answer. Experiments show that the approach is effective for keyword-based QA.
This paper proposed a textual entailment classification system developed based on a dataset focusing on individual entailment-related linguistic phenomena. Identical and synonymous terms in the text pair were aligned and ignored. Several groups of classification rules have been proposed with respect to the difference between the sentences in the text pair. The set of Wikipedia redirected titles became...
Exploratory search is cumbersome with today's search engines, where a user aims to better understand complex concepts. Query expansions techniques have been widely used in exploratory search. However, query expansions often recommend queries that differ from the user's search intentions due to different contexts. Yet, many of users' needs could be addressed by asking people via popular Community Question...
The characteristic of poor information of short text often makes the effect of traditional keywords extraction not as good as expected. In this paper, we propose a graph-based ranking algorithm by exploiting Wikipedia as an external knowledge base for short text keywords extraction. To overcome the shortcoming of poor information of short text, we introduce the Wikipedia to enrich the short text....
The problem of entity resolution is widely studied in the research community, where the goal is to identify real users associated with the user references in the documents. We focus on the problem of entity resolution in dyadic data, where associations between one pair of domain entities such as documents-words and associations between another pair, such as documents-users are observed, the example...
This paper presents an approach to build multilayer cognitive maps, and gives an example of utilization. In the modeled of a problem using cognitive maps, it is possible the utilization of several cognitive maps, where each one expresses a different aspect (knowledge) of the problem, but which must be interlinked. That is, a multilayer cognitive map can enrich the modeling of a system, with the flow...
With the rapid development ot biomedical sciences, a growing amount of papers reporting new scientific findings are published and indexed in different unstructured biomedical data sources. In order to really appreciate and effectively benefit from the availability of this amount of data there is an urgent need to support the deployment of intelligent information services, such as: temporal trends...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.