The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The construction of cut trees (also known as Gomory-Hu trees) for a given graph enables the minimum-cut size of the original graph to be obtained for any pair of vertices. Cut trees are a powerful back-end for graph management and mining, as they support various procedures related to the minimum cut, maximum flow, and connectivity. However, the crucial drawback with cut trees is the computational...
With the rapid development of computer technology, web services has been widely used. In these applications, the uncertain data is in the form of streams. In view of this kind of situation, present a new generalized data structure, that is, PSUF - tree, to store uncertain data streams, all itemsets in recent window are contained in global PStree in a condensed format, establish a header table in which...
Sorting is applied in daily life from ordering simple lists to real world applications. Sorting presents the data in an ordered fashion which helps in analysis or allows computing data faster. Radix sort is a non-comparative integer sorting algorithm that sorts in a linear time complexity. Radix sort performs modulus operation on each data to extract the digits at a specific position and maintain...
Sequential pattern mining has been used in bioinformatics to discover frequent gene regulation sequential patterns based on time course microarray datasets. While mining frequent sequences are important in biological studies for disease treatment, to date, most of the approaches do not consider the importance of the genes with respect to a disease being studied when identifying gene regulation sequential...
Recently, every enterprise generates large volumes of high dimensional data on a regular basis. Complex data mining and analysis techniques are used to feasibly analyse this data. Feature selection aids in this by providing a reduced representation of this data while maintaining integrity. We propose a graph-based feature selection algorithm utilizing feature intercorrelation to construct a weighted...
In Association rules mining, the task of finding frequent itemsets in dynamic database is very important because the updates may not only invalidate some existing rules but also make other rules relevant. In this paper, we propose a new algorithm to maintain frequent itemsets of a dynamic database in the case of record insertion as well as deletion simultaneously. Basically, the proposed algorithm...
Software has been changing during its whole life cycle. Therefore, identification of source code changes becomes a key issue in software evolution analysis. However, few current change analysis research focus on dynamic language software. In this paper, we pay attention to the fine-grained source code changes of Python software. We implement an automatic tool named PyCT to extract 77 kinds of fine-grained...
In this paper, we study a new problem of continuous learning from doubly-streaming data where both data volume and feature space increase over time. We refer to the doubly-streaming data as trapezoidal data streams and the corresponding learning problem as online learning from trapezoidal data streams. The problem is challenging because both data volume and data dimension increase over time, and existing...
Today, scientific and business applications generate huge amounts of data. Users of data grid, who are distributed all over the grid geographically, need such data. So ensuring the access to this distributed data efficiently is one of the most important challenges in Data grid network. Data replication algorithms are known as the most common method used to overcome this problem. They distribute several...
Model precision in a classification task is highly dependent on the feature space that is used to train the model. Moreover, whether the features are sequential or static will dictate which classification method can be applied as most of the machine learning algorithms are designed to deal with either one or another type of data. In real-life scenarios, however, it is often the case that both static...
In this paper, we propose a novel method to extract keyframes from motion capture data for people to better visualize and understand the content of the motion. It first applies a Butterworth filter to remove the noise in the motion capture data, then carries out principal component analysis (PCA) to reduce the dimension. By detecting the zero-crossing points of the velocity in the principal components,...
Big data confront many technical challenges that also confront by both academic research communities and commercial IT deployment. Data streams with the curse of dimensionality are founded to be the root sources of Big Data. The commonly used procedure for data sourced from data streams is continuously making batch based model and inducing algorithms which is infeasible for real-time data mining....
Dynamic updating knowledge is a hot study issue in data mining. This paper proposed a method for updating P-dominated and P-dominating sets of Dominance-based Rough Sets Approach (DRSA). Some examples are employed to validate our approach. These examples showed that our approach can simplify computation by avoiding unnecessary computing steps.
The tolerance class is a basic concept in rough set for incomplete information systems. The effective computation of tolerance class is vital for improving the performance of knowledge reduction and other related tasks. For the purpose of speeding up the tolerance class calculation, an improved static algorithm is developed firstly, followed by a novel incremental algorithm, which can update rapidly...
In the paper the adaptive robust models for adaptive identification of nonstationary systems are proposed. These proposed models can be used for solving Dynamical Data and Data Stream Mining tasks. These adaptive robust models are characterized by computational simplicity and high speed operation that allow the signal processing in on-line mode.
Dynamic graphs are used to represent changing relational data. In order to create a dynamic graph representing relationships or interactions over time, it is necessary to choose a method of adding new data and removing, or otherwise de-emphasizing, past data to decrease its influence. In particular, the question of aging edges is new to dynamic graphs and has not been thoroughly studied. In this work,...
Association rules and frequent patterns discovery is always a hot topic in database communities. As real data is often affected by noise, in this paper, we study to find frequent patterns and generate association rules over probabilistic database under the Possible World Semantics. This is technically challenging, since a probabilistic database can have an exponential number of possible worlds. Although...
Data mining is one of the significant research domains in the field of computer science and it is defined as the extraction of hidden knowledge from the large data repositories. Important data mining techniques are classification, clustering, association rule generation, summarization, time series analysis and etc. Association rule is used to determine frequent patterns, association and correlations...
Network-based malware classification plays an important role in improving system security than system-based malware classification. The vast majority of malware needs a network activity in order to accomplish its purpose (e.g., downloading malware, connecting to a C&C server, etc.). Many malware classification approaches based on network behavior have thus been proposed. Nevertheless, they merely...
Based on the dynamic time warping (DTW) matching method, a novel appliance identification algorithm for low frequency sampling load data is proposed. First, residential load sequences are segmented into subsequences composed of single appliance load profiles and multi-appliance load profiles. Then, reference load sequences of all candidate appliances, which have identical lengths, are generated before...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.