The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Currently, big data and its applications have become one of the emergent topics. knowledge can be extracted from a high volume of information by using big data technologies. In practice, MapReduce framework and its different extensions are the most popular approaches for big data. Among the different approaches, those models based on fuzzy systems stand out for many applications. Fuzzy set theory...
Visual analytics plays a key role in bringing insights to audiences who are interested and dedicated in data exploration. In the area of relational data, many advanced visualization tools and frameworks are proposed in order to dealing with such data features. However, the majority of those have not greatly considered the whole process from data-model mining to query utilizing on dimensions and data...
Computer software size continues to grow recently. But it is difficult to collect information to support software development and maintenances. Data mining technology can be used to automatically discover knowledge from software testing data. It is helpful to increase software developing process and improve software quality. At first, correlation analysis is adopted to study the relevance among the...
The present paper presents a novel approach for semi-supervised classification of remote sensing imagery using {K-Means+(GMM-EM)} clustering cascade followed by selection of an amount of clustered pixels to be added to the training set according to their GMM responsibilities. The proposed method has the following steps: (a) clustering of the multispectral pixels using the cascade composed by K-means...
Streaming information flow allows identification of linguistic similarities between language pairs in real time as it relies on pattern recognition of grammar rules, semantics and pronunciation especially when analyzing so called international terms, syntax of the language family as well as tenses transitivity between the languages. Overall, it provides a backbone translation knowledge for building...
The blast furnace gas is an important secondary energy for the iron and steel production. Establishing an effective model to describe the state of BFG system is of great significant to maintain the system balance and stability. Considering the strong coupling characteristics of the blast furnace gas system and the high level noises in the industrial data, a simplex unscented Kalman filter-based Wang-Mendel...
The need of smart information retrieval systems is in contrast with the difficulties to deal with huge amount of data. In this paper we present a Big Data Analytics architecture used to implement a semantic similarity search tool for natural language texts in biomedical domain. The implemented methodology is based on Word Embeddings (WEs) models obtained using the word2vec algorithm. The system has...
Online social networks have billions of users worldwide when combined and they still keep increasing this amount. Their users typically develop trust relationships with the accounts of other users. But large numbers of users and potential gains from abuses of the trust relationships have attracted the attention of cyber-criminals. Therefore, it is important to stop accounts from being compromised...
We propose PathML, an available bandwidth (i.e., unused capacity of an end-to-end path) estimation method based on a data-driven paradigm that uses machine learning with a large amount of data. An experiment over an operational LTE network was performed to compare our method with prior work.
Video emotion recognition as an emerging research field has been attracting more and more focus in recent years. However, such work is quite challenging, since human emotions are hard to differentiate precisely due to its complexity and diversity, moreover, the expressions of sentiment in a content-rich video are sparse. Previous studies presented a number of approaches to try to learn human emotions...
Understanding the function of software code is the basis for software reuse. Topic modeling technologies can mine functional topics from source code and help developers comprehend the functional concerns about a software system and the corresponding implementations in source code. However, lacking clear explanations makes these functional topics hard to be understood by the developers. Furthermore,...
This paper introduces the technological techniques of data cleaning and data extraction. The current state of domestic and international research in these two areas is reviewed and their future development considered. The following concepts are all explained: the basic principle of data cleaning, the framework models, the need for and the objectives of data cleaning, the testing method and the cleaning...
Venn and Euler diagrams are well-defined mathematical diagram types, which are the major representation methods of Set Theory. Venn and Euler diagrams are part of major Mathematics examinations in secondary education such as London Ordinary Level and SAT. Although computer assessment of different diagram types has been addressed, no such research has been done for Venn and Euler diagrams. In this...
Digitalisation of industrial processes, also called the fourth industrial revolution, is leading to availability of large volume of data containing measurements of many process variables. This offers new opportunities to gain deeper insights on process variability and its effects on quality and performance. Manufacturing facilities already use data driven approaches to study process variability and...
The purpose of data mining is to explore, find and hence analyze relevant data from a massive data source using various technical means. This paper introduces the development of data mining to date, its functions, tasks and algorithms, as well as the process of data mining. The application and problems of data mining are also presented and finally the potential future development of data mining technology...
This paper is a continuation of a previous paper on self-modeling systems, concerning mitigation methods for the Get Stuck Theorems, which are powerful theorems about the limits of knowledge representation. The First Get Stuck Theorem says that since there are only finitely many data structures of any given size, it follows that as a system tries to save more and more data / information / knowledge,...
Social bots are regarded as the most common kind of malwares in social platform. They can produce fake messages, spread rumours, and even manipulate public opinions. Recently, massive social bots are created and widely spread in social platform, they bring negative effects to public and netizen security. Bot detection aims to distinguish bots from human and it catches more and more attentions in recent...
In this paper we propose a methodology for extracting complex sales expert rules by analyzing the data from the past lost/won deals stored in Customer Relationship Management Systems.
With the arrival of big data era, data mining techniques have been widely used to build models for cyber security applications such as spam filtering, malware or virus detection, and intrusion detection. This project proposes a novel approach that uses randomness to improve robustness of data mining models used in cyber security applications against attacks that try to evade detection by adapting...
Artifact-centric process models aim to describe complex processes as a collection of interacting artifacts. Recent development in process mining allow for the discovery of such models. However, the focus is often on the representation of the individual artifacts rather than their interactions. Based on event data we can automatically discover composite state machines representing artifact-centric...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.