The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In recent years, the use of Graphics Processing Units (GPUs) for data mining tasks has become popular. With modern processors integrating both CPUs and GPUs, it is also important to consider what tasks benefit from GPU processing and which do not, and apply a heterogeneous processing approach to improve the efficiency where applicable. Similarity search, also known as k-nearest neighbor search, is...
Adversarial examples are augmented data points generated by imperceptible perturbation of input samples. They have recently drawn much attention with the machine learning and data mining community. Being difficult to distinguish from real examples, such adversarial examples could change the prediction of many of the best learning models including the state-of-the-art deep learning models. Recent attempts...
This paper studies one-scan approximation algorithms for streaming data mining (SDM). Despite of the importance of pattern discovery in streaming data, this issue has not sufficiently addressed yet in the big data community. In this context, we briefly review the previously proposed SDM methods. There is a recent work to improve their limitation using the tecnique of online compression. It is based...
We propose in this paper a new, alternative approach for the problem of finding a set of representative objects in large datasets. To do so, we first formulate the general Instance Selection Problem (ISP) and then study three variants of that in order to select instances from different regions of the data. These variants aim at finding the objects located in three very different locations of the data:...
The flows of traffic dumps of high-speed Internet backbone channel were analyzed. Streams were classified into three groups according to the amount of transmitted information. Density function was calculated for the number of packets transmitted by different classes of flows (time series) according to the method of image sources and the Rosenblatt-Parzen approximation. The obtained results show non-stationarity...
In this paper we present novel experimental results comparing two interpretations of missing attribute values: attribute-concept values and "do not care" conditions. Experiments were conducted on 12 data sets with many missing attribute values using the MLEM2 rule induction system. In the experiments, three kinds of probabilistic approximations were used: singleton, subset and concept; with...
An arbitrary m×n Boolean matrix M can be decomposed exactly as M = U○V, where U (resp. V) is an m×k (resp. k ×n) Boolean matrix and ○ denotes the Boolean matrix multiplication operator. We first prove an exact formula for the Boolean matrix J such that M = M○JT holds, where J is maximal in the sense that if any 0 element in J is changed to a 1 then this equality no longer holds. Since minimizing k...
Outlier detection is now widely used in various fields. It attracts more and more interests in research. The density based outlier detection methods and the distance based outlier detection methods are the most frequently used outlier detection methods. In big data, the size and dimensions of data is very large. Those features make the conventional methods not suitable for big data. According to the...
Much of the data of scientific interest, particularly when independence of data is not assumed, can be represented in the form of networks where data nodes are joined together to form edges corresponding to some kind of associations or relationships. Such information networks abound, like protein interactions in biology, web page hyperlink connections in information retrieval on the Web, cellphone...
This paper focuses on eliminating data redundancy of covering information system. Firstly, we enumerate several usual covering reduction methods and analysis their contact among them. Secondly, we take the attributes value of explicit and implicit value into consider, and we obtain a network topology (shorthand as NT) of covering information system. Through NT, we turn the covering information system...
Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation...
In this paper we propose architecture of hybrid generalized additive neuro-fuzzy system. Such system is hybrid of the neuro-fuzzy system of Wang-Mendel and the generalized additive models of Hastie-Tibshirani. Proposed hybrid generalized additive neuro-fuzzy system can be used for solving different tasks of computational intelligence and data stream mining. The results of experimental modelling confirm...
The idea of opposition-based learning was introduced 10 years ago. Since then a noteworthy group of researchers has used some notions of oppositeness to improve existing optimization and learning algorithms. Among others, evolutionary algorithms, reinforcement agents, and neural networks have been reportedly extended into their “opposition-based” version to become faster and/or more accurate. However,...
Searching for solutions that optimize a continuous function can be difficult due to the infinite search space, and can be further complicated by the high dimensionality in the number of variables and complexity in the structure of constraints. Both deterministic and stochastic methods have been presented in the literature with a purpose of exploiting the search space and avoiding local optima as much...
In this paper, we consider the problem of model reduction of large scale systems, such as those obtained through the discretization of PDEs. We propose a randomized proper orthogonal decomposition (RPOD) technique to obtain the reduced order models by randomly choosing a subset of the inputs/outputs of the system to construct a suitable small sized Hankel matrix from the full Hankel matrix. It is...
This paper presents an efficient computational method for time series clustering and application concerning research funding of universities directly under Minster of Education of People Republic of China. Presented approach was based on extraction of trend features with Haar wavelet decomposition from time series data and their use in feature-based agglomerative hierarchical clustering of monthly...
Pointer alias analysis is a well researched problem in the area of compilers and program verification. Many recent works in this area have focused on flow-sensitivity due to the additional precision it offers. However, a flow-sensitive analysis is computationally expensive, thus, preventing its use in larger programs. In this work, we observe that a number of object sets, consisting of tens to hundreds...
Knowledge Management is a very hot domain these days due to the increasing asset value for knowledge available within the organization. These days, knowledge is treated as a vital asset that can increase organization's competitive advantage. The potential that knowledge management has for improving fisheries management is increasingly being recognized. There are relatively few studies that have specifically...
Hepatitis C virus is a massive health issue affecting significant portions of the world's population. Applying data preprocessing, feature reduction techniques, and generating rules based on the selected features for classification tasks are considered as important steps in the knowledge discovery in databases. This paper highlights a Rough-Granular Neural Networks model that incorporates Rough Sets...
Nearest neighbour search is a core process in many data mining algorithms. Finding reliable closest matches of a query in a high dimensional space is still a challenging task. This is because the effectiveness of many dissimilarity measures, that are based on a geometric model, such as lp-norm, decreases as the number of dimensions increases. In this paper, we examine how the data distribution can...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.