The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper investigates into the colorization problem, which converts a grayscale image to a colorful version. This is a difficult problem and normally requires manual adjustment to achieve artifact-free quality. For instance, it normally requires human-labeled color scribbles on the grayscale target image or a careful selection of colorful reference images. The recent learning-based colorization...
With the rapid development of Internet, how to obtain valuable information from massive messages has become a major problem we need to be solved in the information explosive era. This paper introduces the development route of information extraction technology, and discusses four categories of Chinese entity relation extraction technologies in depth. Finally, the advantages and disadvantages of different...
Online reviews play a crucial role in helping consumers to make purchase decisions. However, a severe problem Internet Water Army (a large amount of paid posters who write inauthentic reviews) emerge in many E-commerce websites recently which dramatically undermines the value of user reviews. Although the word Internet Water Army originated from China, some other countries also suffered from this...
Synonyms extraction is a fundamental research, which is helpful to text mining and information retrieval. In this paper, we propose method to extract synonymy from text, the method employs spectral clustering and word2vec. First, the word2vec model is trained by a large-scale English Wikipedia corpus. Then, we extract keywords from a text and use the trained model to generate similarities among these...
Predicting meme burst is of great relevance to develop security-related detecting and early warning capabilities. In this paper, we propose a feature-based method for real-time meme burst predictions, namely “Semantic, Network, and Time” (SNAT). By considering the potential characteristics of bursty memes, such as the semantics and spatio-temporal characteristics during their propagation, SNAT is...
Knowledge graph technology belongs to the field of artificial intelligence. It is widely used in semantic search and intelligent question answering. Construction of Uyghur's knowledge graph has the great value of Uyghur information processing and Uyghur application software development. Firstly, this paper describes the definition and structure of the knowledge graph, then it reviews the related research...
Nowadays cross-media retrieval is an useful technology that helps people find expected information from the huge amount of multimodal data more efficiently. A common cross-media retrieval framework is first to map features of different modalities into an isomorphic semantic space so that the similarity between heterogeneous data can be measured. For most of semantic space based methods, the mapping...
With the explosive growth of information on the Internet, it becomes more and more important to improve the efficiency of information acquisition. Automatic text summarization provides a good means for quick acquisition of information through compression and refinement. While existing methods for automatic text summarization achieve elegant performance on short sequences, however, they are facing...
Over the past decade, numerous systems have been proposed to detect and subsequently prevent or mitigate security vulnerabilities. However, many existing intrusion or anomaly detection solutions are limited to a subset of the traffic due to scalability issues, hence failing to operate at line-rate on large, high-speed datacentre networks. In this paper, we present a two-level solution for anomaly...
With the development of the Internet, it is vital for the security of the Internet to detect web-based anomalies. Clustering based on feature extraction by manually has been verified as a significant way to detect new anomalies. But the presentations of these features can't express semantic information of the URLs. In addition, few studies try to cluster the anomalies into specific types like SQL-injection...
User behavioral analysis is expected to be a key technique for identity theft detection in the Internet, especially in mobile social networks (MSNs). While traditional methods prefer to use explicit behaviors, a series of behaviors implicit in user's texts can probably provide much more accurate identity. And these implicit behaviors can be digged from texts by LDA. Besides the latent feature in texts,...
Medical synonym identification has been an important part of medical natural language processing (NLP). However, in the field of Chinese medical synonym identification, there are problems like low precision and low recall rate. To solve the problem, in this paper, we propose a method for identifying Chinese medical synonyms. We first selected 13 features including Chinese and English features. Then...
Topic evolution analysis can deeply identify and track the trend of changes in hot topics, which helps supply entire route of the topic evolution and reasonable advice to network public opinion monitoring. OLDA topic model is a commonly used tool for topic evolution analysis. But it has problems of old and new topics mixing and massive redundant words. Considering these problems, this paper proposes...
Cross-modal retrieval, which aims to solve the problem that the query and the retrieved results are from different modality, becomes more and more essential with the development of the Internet. In this paper, we mainly focus on the exploration of high-level semantic representation of image and text for cross-modal matching. Deep convolutional image features and Fisher Vector with neural word embeddings...
With the popularity of mobile devices and the quick growth of the mobile Web, users can now browse news wherever they want; so, their news preferences are usually related to their geographical contexts. Consequently, many research efforts have been put on location-aware news recommendation, which recommends to users news happening nearest to them. Nevertheless, in a real-world context, users’ news...
With the rapid spread of Internet and the mobile web, the number of news pages is increasing quickly as well as the content of news becomes highly dynamic. It's difficult for normal users to obtain specific information contained in a mass of news streams. So it's of great research significance to study how to analyze massive news, detect and track news hotspots automatically. This research proposes...
Quite a number of recent works have concentrated on the task of recommending to Twitter users whom they should follow, among which, the WTF (Who To Follow) service provided by Twitter. Recommenders are based either on the user's network structure, or on some notion of topical similarity with other users, or on both. We present a method for analysis of Twitter users supported by a hierarchical representation...
FAQs are the lists of common questions and answers on particular topics. Today one can find them in almost all web sites on the internet and they can be a great tool to give information to the users. Questions in FAQs are usually identified by the site administrators on the basis of the questions that are asked by their users. While such questions can respond to required information about a service,...
Enormous efforts of human volunteers have made Wikipedia become a treasure of textual knowledge. Relation extraction that aims at extracting structured knowledge in the unstructured texts in Wikipedia is an appealing but quite challenging problem because it's hard for machines to understand plain texts. Existing methods are not effective enough because they understand relation types in textual level...
The lasting popularity of many social Q&A websites, such as Yahoo! Answers and ResearchGate, has become valuable knowledge repositories for people to search for answers to questions in various aspects in life. Finding the most relevant questions is often a non-trivial task, and a fine-grained classification system of questions will be an important aid. Existing work mainly focused on classifying...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.