Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
Following the trend of big data, the business value of data is becoming a hot research field in recent years. The novel concept of Data Jacket introduced by Ohsawa et al. solved the difficult problem of data transactions due to the particular characteristic of data, i.e. the safeguarding privacy. In order to make sure the mechanism of the market of data, there are some researchers proposed a gamified...
Based on the existing research of Chinese text clustering, this paper proposes an improved algorithm for the optimization of short term semantic clustering based on social media. The method of weighted factor is introduced to optimize text distance formula and related mathematical proof, the calculation process optimization design from text, written text distance calculation algorithm, the simulation...
A large software project usually has lots of various textual learning resources about its API, such as tutorials, mailing lists, user forums, etc. Text retrieval technology allows developers to search these API learning resources for related documents using free-text queries, but it suffers from the lexical gap between search queries and documents. In this paper, we propose a novel approach for improving...
Deep neural networks have advanced many computer vision tasks, because of their compelling capacities to learn from large amount of labeled data. However, their performances are not fully exploited in semantic image segmentation as the scale of training set is limited, where perpixel labelmaps are expensive to obtain. To reduce labeling efforts, a natural solution is to collect additional images from...
The concept of persistent identification is increasingly important for research data management. At the beginnings it was only considered as a persistent naming mechanism for research datasets, which is achieved by providing an abstraction for addresses of research datasets. However, recent developments in research data management have led persistent identification to move towards a concept which...
Word embeddings is a well known set of techniques widely used in natural language processing (NLP), and word2vec is a computationally-efficient predictive model to learn such embeddings. This paper explores the use of word embeddings in a new scenario. We create a vector representation of Internet Domain Names (DNS) by taking the core ideas from NLP techniques and applying them to real anonymized...
We present a comprehension-based framework for measuring semantic similarity between documents of text. In various situations, vector-based similarity measures fail to capture deep semantic relations between terms. Our computational comprehension model processes textual content in a way that resembles human readers, paying attention to context, location, and acquisition time of semantic concepts....
With the rapid development of Internet, how to obtain valuable information from massive messages has become a major problem we need to be solved in the information explosive era. This paper introduces the development route of information extraction technology, and discusses four categories of Chinese entity relation extraction technologies in depth. Finally, the advantages and disadvantages of different...
This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. English, French, German, Spanish and Italian) via unsupervised word embeddings from a cross-lingual lexicon. The vocabulary in each language is projected onto a separate high-dimensional vector space, and these vector spaces are then compared using several different distance measures (i.e., correlation,...
The research of malicious comments in sina weibo is very important. Because a large number of malicious comments seriously undermine the user experience in sina weibo. Based on the malicious comments detection technology named semantic information, this paper gives a different technology which improves the process of malicious dictionary construction and the process of malicious comments detection...
Online reviews play a crucial role in helping consumers to make purchase decisions. However, a severe problem Internet Water Army (a large amount of paid posters who write inauthentic reviews) emerge in many E-commerce websites recently which dramatically undermines the value of user reviews. Although the word Internet Water Army originated from China, some other countries also suffered from this...
Synonyms extraction is a fundamental research, which is helpful to text mining and information retrieval. In this paper, we propose method to extract synonymy from text, the method employs spectral clustering and word2vec. First, the word2vec model is trained by a large-scale English Wikipedia corpus. Then, we extract keywords from a text and use the trained model to generate similarities among these...
While there is a large amount of text data on the Internet, people need to organize the text data with experienced category. However, the flat structure of categories could not satisfy the modern information management. To solve this problem, we propose a hierarchical classification process with a strategy, called candidates, used to relieve the blocking problems. Besides, we establish the description...
Predicting meme burst is of great relevance to develop security-related detecting and early warning capabilities. In this paper, we propose a feature-based method for real-time meme burst predictions, namely “Semantic, Network, and Time” (SNAT). By considering the potential characteristics of bursty memes, such as the semantics and spatio-temporal characteristics during their propagation, SNAT is...
Knowledge graph technology belongs to the field of artificial intelligence. It is widely used in semantic search and intelligent question answering. Construction of Uyghur's knowledge graph has the great value of Uyghur information processing and Uyghur application software development. Firstly, this paper describes the definition and structure of the knowledge graph, then it reviews the related research...
Nowadays cross-media retrieval is an useful technology that helps people find expected information from the huge amount of multimodal data more efficiently. A common cross-media retrieval framework is first to map features of different modalities into an isomorphic semantic space so that the similarity between heterogeneous data can be measured. For most of semantic space based methods, the mapping...
Internet of Things (IoT) sensors is becoming commonplace in people's daily life. Even, many cities have already deployed a very large number of IoT sensors toward the smart city initiative. However, lack of semantics in the presentation of IoT-based sensory data poses the perception complexity by general people. Adding semantics to the IoT sensory data remains a challenge for smart cities. In this...
With the explosive growth of information on the Internet, it becomes more and more important to improve the efficiency of information acquisition. Automatic text summarization provides a good means for quick acquisition of information through compression and refinement. While existing methods for automatic text summarization achieve elegant performance on short sequences, however, they are facing...
Sentiment analysis is an important task in natural language processing, which has promises great value to areas of interests such as business, politics and other fields. The prevalence of the internet has caused people to prefer expressing their opinion and sentiment on the Internet via methods such as tweeting on social media and commenting on products. However, the discourse of users on social media...
Despite the significant contribution from specialized ontologies and text mining methods, the evaluation of the semantic similarity of genes remains difficult because of the complex functions in which genes are involved. A less exploited resource is Wikipedia that stores more than 10400 articles about human genes: each gene name identifies the corresponding Wikipedia page resuming gene's properties...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.