The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Streaming information flow allows identification of linguistic similarities between language pairs in real time as it relies on pattern recognition of grammar rules, semantics and pronunciation especially when analyzing so called international terms, syntax of the language family as well as tenses transitivity between the languages. Overall, it provides a backbone translation knowledge for building...
Hotel reviews posted on accommodation reservation websites are thought to be valuable information for selecting hotel accommodations and also expected to be used for marketing. Since hotel reviews are various in their expressions, it was necessary to make a thesaurus to obtain useful feature representations. Preparing a thesaurus, however, has problems such that it is laborious and requires occasional...
Synonyms extraction is a fundamental research, which is helpful to text mining and information retrieval. In this paper, we propose method to extract synonymy from text, the method employs spectral clustering and word2vec. First, the word2vec model is trained by a large-scale English Wikipedia corpus. Then, we extract keywords from a text and use the trained model to generate similarities among these...
The need of smart information retrieval systems is in contrast with the difficulties to deal with huge amount of data. In this paper we present a Big Data Analytics architecture used to implement a semantic similarity search tool for natural language texts in biomedical domain. The implemented methodology is based on Word Embeddings (WEs) models obtained using the word2vec algorithm. The system has...
Knowledge graph technology belongs to the field of artificial intelligence. It is widely used in semantic search and intelligent question answering. Construction of Uyghur's knowledge graph has the great value of Uyghur information processing and Uyghur application software development. Firstly, this paper describes the definition and structure of the knowledge graph, then it reviews the related research...
In the paper, we consider the problem of automatic transformation of unstructured data included in real-estate listings into data arranged in a tabular form, as the so-called information tables. Transformation is an important preprocessing stage enabling us to obtain data in a form accepted by many data mining and machine learning tools. In the presented approach, information tables represent information...
With the explosion of Web 2.0, customers are able to share their opinions and sentiments online. This has led to new opportunities for companies and organizations to understand people's opinions towards their products or services and can serve to improve their products or market strategy more effectively. However, the data on the Web is huge and unstructured, which makes it difficult to analyze automatically...
Public sentiment is regarded as an important measure for event detection, information security, policy making etc. Analyzing public sentiments relies more and more on large amount of multimodal contents, in contrast to the traditional text-based and image-based sentiment analysis. However, most previous works directly extract feature from image as the additional information for text modality and then...
Email is a reliable, confidential, fast, free and easily accessible form of communication. Due to its wide use in personal, but most importantly, professional contexts, email represents a valuable source of information that can be harvested for understanding, reengineering and repurposing undocumented business processes of companies and institutions. Few researchers have investigated the problem of...
RESTful Web APIs have no description files like WSDL in traditional Web service. Although some REST API definition models have been arising recently, there is still lacking in structured description format for existing large mounts of Web APIs. Almost all Web APIs are documented in semi-structured web pages, and these documentation formats are various for different sites. It's hard for machine to...
In recent years, freelancer economy has been a new normalcy. In the supply-driven freelancer marketplace, people sell their capabilities or labor as service on the internet platform to help others with some particular micro-tasks. As this kind of human service ecosystem is at the fast growth stage, it is inundated with a variety of services whose quality is uneven. Quite often, when facing these services,...
The majority of clinical data is only available in unstructured text documents. Thus, their automated usage in data-based clinical application scenarios, like quality assurance and clinical decision support by treatment suggestions, is hindered because it requires high manual annotation efforts. In this work, we introduce a system for the automated processing of clinical reports of mamma carcinoma...
Frequent itemsets discovery is popular in database communities recently. Because real data is often affected by noise, in this paper, we study to find frequent itemsets over probabilistic database under the Possible World Semantics. It is challenging because there may be exponential number of possible worlds for probabilistic database. Although several efficient algorithms are proposed in the literature,...
As more and more companies become aware of the benefits of collecting and analyzing data, hiring employee with data analytics expertise is a key issue faced by HR practitioners. Although previous research empirically highlighted the differences of knowledge and skill requirements between big data (BD) and business intelligence (BI) in English-speaking countries, limited similar study is conducted...
With the rapid growth of service volumes and types, discovering services in an efficient and accurate manner has become a significant challenge in service computing. Service clustering is an important technology to improve the efficiency of service discovery. In this paper, we propose a new service clustering approach, which starts from service documents and is based on the functional semantics of...
Due to its wide use in personal, but most importantly, professional contexts, email represents a valuable source of information that can be harvested for understanding, reengineering and repurposing undocumented business processes of companies and institutions. Few researchers have investigated the problem of extracting and analyzing the process-oriented information contained in emails. In this paper,...
Relation extraction is very useful for many applications and has attracted much attention. The dominant prior methods for relation extraction were supervised methods which are relation-specific and limited by the availability of annotated training data. In this paper, we propose a method using hierarchical clustering to extract unbounded relations without relying on training data. The relation among...
Many new applications have been recently developed to satisfy users special needs on the web. In this context, we are interested in personalized systems and particularly in Personalized Multi-Agent Systems (PMAS) characterized by collective and intelligent resolution in a distributed and parallel environment. This work assesses personalization, the most important characteristic of interface in multi-agent...
With the development of mobile communication technology, mobile phones play a more important role in people's daily life. The user's location and its semantic are very important to Location Based Services (LBS), and this inspires a tremendous amount of research effort on analyzing large-scale trajectory data to mine these informations in the last decade. The existing researches have achieved good...
Operation services are reusable and shareable units of configuration code executed by configuration management tools (CMTs), achieving continuous deployment and continuous delivery. With the prevalence of DevOps (Development and Operations), thousands of operation services have been developed for various software systems, and they are publicly available through the online repositories of popular CMTs...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.