The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In recent years, the rapid development of geographic information system technology and the popularity of geo-location-based mobile information services have made people pay more attention to geography-related information. Thus, the information retrieval and related services based on geographic has a broad application prospects. However, the traditional search engine for the processing of geographic...
Knowledge graph technology belongs to the field of artificial intelligence. It is widely used in semantic search and intelligent question answering. Construction of Uyghur's knowledge graph has the great value of Uyghur information processing and Uyghur application software development. Firstly, this paper describes the definition and structure of the knowledge graph, then it reviews the related research...
Nowadays cross-media retrieval is an useful technology that helps people find expected information from the huge amount of multimodal data more efficiently. A common cross-media retrieval framework is first to map features of different modalities into an isomorphic semantic space so that the similarity between heterogeneous data can be measured. For most of semantic space based methods, the mapping...
Cross-modal retrieval, which aims to solve the problem that the query and the retrieved results are from different modality, becomes more and more essential with the development of the Internet. In this paper, we mainly focus on the exploration of high-level semantic representation of image and text for cross-modal matching. Deep convolutional image features and Fisher Vector with neural word embeddings...
Comparable corpora contain significant quantities of useful data for Natural Language Processing tasks, especially in the area of Machine Translation. They are mainly the source of parallel text fragments. This paper investigates how to effectively extract bilingual texts from comparable corpora relying on a small-size parallel training corpus. We propose a new technique to filter non parallel articles...
The domain of traditional web is gradually evolving with the adaptation of newer techniques, which includes semantic web. Integration of web content using ontologies in a language independent manner is a required feature in this process. For better utilization of the resources, it is necessary that the ontology, which is working as a central knowledge repository, to be language independent as well...
Quite a number of recent works have concentrated on the task of recommending to Twitter users whom they should follow, among which, the WTF (Who To Follow) service provided by Twitter. Recommenders are based either on the user's network structure, or on some notion of topical similarity with other users, or on both. We present a method for analysis of Twitter users supported by a hierarchical representation...
Enormous efforts of human volunteers have made Wikipedia become a treasure of textual knowledge. Relation extraction that aims at extracting structured knowledge in the unstructured texts in Wikipedia is an appealing but quite challenging problem because it's hard for machines to understand plain texts. Existing methods are not effective enough because they understand relation types in textual level...
This paper addresses the task of assigning multiple labels of fine-grained named entity (NE) types to Wikipedia articles. To address the sparseness of the input feature space, which is salient particularly in fine-grained type classification, we propose to learn article vectors (i.e. entity embeddings) from hypertext structure of Wikipedia using a Skip-gram model and incorporate them into the input...
General purpose Search Engines (SEs) crawl all domains (e.g., Sports, News, Entertainment) of the Web, but sometimes the informational need of a query is restricted to a particular domain (e.g., Medical). We leverage the work of SEs as part of our effort to route domain specific queries to local Digital Libraries (DLs). SEs are often used even if they are not the “best” source for certain types of...
Hyponymy is one of the most critical semantic relations, which contributes magnificently to semantic dictionary, information retrieval etc. In this paper, a method of extracting hyponymy is proposed based on multiple data sources fusion, which convert the extraction of hyponymy to the extraction of hypernyms for target words. First, mining candidate hypernyms for the target words based on search engine,...
Short message strings are widely prevalent in the age of social networking. Taking Facebook as an example, a user may have many other users in his contact list. However, at any given time frame, the user interacts with only a small subset of these users. In this paper, we propose a recommender system that determines which users have common interests based on the content of the short message strings...
Machine-learning state-of-the-art keyphrase extraction systems do not take into consideration the fact that part of these keyphrases may not be found in the text. Therefore these systems typically use a training set restricted to textual terms, reducing the learning capabilities of any inductive algorithm. Our research investigates ways to improve the accuracy of these systems by allowing classification...
Inferring potential links is a fundamental problem in social networks. In the link recommendation problem, the aim is to suggest a list of potential people to each user, ordered by the preferences of the user. Although various approaches have been developed to solve this problem, the difficulty of producing a ranking list with high precision at the top -- the most important consideration for real...
Named Entity Disambiguation (NED) aims at dis-ambiguating named entity mentions in a text to their corre-sponding entries in a knowledge base such as Wikipedia. Itis a fundamental task in Natural Language Processing (NLP)and has many applications such as information extraction, information retrieval, and knowledge acquisition. In the pastdecade, a number of methods have been proposed for theNED task...
In view of word sense disambiguation shortcomings of the previous methods, they generally do not consider on word distance for computing semantic correlation of the influence of context, as well as the context is limited for ambiguous word sense disambiguation, and the use of part ambiguous context words make word senses more ambiguous. Therefore, this paper proposes the use of dependency parse tree...
The main objective of a text summarization system is to identify the most important information from the given text and present it to the end users. In this paper, Wikipedia articles are given as input to system and extractive text summarization is presented by identifying text features and scoring the sentences accordingly. The text is first pre-processed to tokenize the sentences and perform stemming...
Tibetan-Chinese named entity extraction can effectively improve the performance of Tibetan-Chinese cross language question answering system, information retrieval, machine translation and other researches. In the condition of no practical Tibetan named entity recognition system and Tibetan-Chinese translation model, this paper proposes a method to extract Tibetan-Chinese entities based on comparable...
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. A lot of work have been done for English Open IE, and now the Chinese Open IE field is attracting more and more researchers and scholars. In this paper we present a novel SCOERE (Semi-supervised...
Topic models have been shown to be a useful way of representing the content of large document collections, for example via visualisation interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a set of keywords, i.e. the top-n words with highest marginal probability within the topic. However, alternative topic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.