The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this work a new method for data clustering based on principal curves is presented. Principal curves consist of a nonlinear generalization of Principal Component Analysis and may also be regarded as continuous versions of 1-D self-organizing maps. The proposed method divides the principal curves extracted by the k-segments algorithm into two or more curves, according to the number of clusters defined...
The data mining on web is difficult for online analytic processing (OLAP) with BIG DATA. The data mining is made simple by approximating the databases of BIG DATA for knowledge discovery process particularly MapReducing. The approximate information is fuzzy rather than probability. In this paper, fuzzy web data mining is discussed for BIG DATA for association rules. The query processing is discussed...
UF-growth is a tree-based exact algorithm for mining frequent patterns from uncertain data. While it directly calculates the expected support of an item set, it requires a significant amount of storage space to capture all existential probability values among the items. To eliminate the extra space requirement of UF-growth, the CUF-growth algorithm combines nodes with the same item by storing an upper...
An Organization need to understand their customers' behavior, preferences and future needs which depend upon past behavior. Web Usage Mining is an active research topic in which customers session clustering is done to understand the customers activities. This paper investigates the problem of mining frequent pattern and especially focuses on reducing the number of scans of the database and reflecting...
Original sequential pattern mining model only considers occurrence frequentness of sequential patterns, disregards their occurrence periodicity. We propose the asynchronous periodic sequential pattern mining model to discover the sequential patterns which are not only occurring frequently, but also appearing periodically. For this mining model, we propose a pattern-growth mining algorithm to mine...
The method for detecting redundant attributes in relational datasets using the fractal ideology is studied. Based on the fractal dimension of a dataset and its variations, an algorithm for detecting redundant attributes is presented. The work has the following features: datasets with numeric and discrete attributes can be processed; an approach based on depth-equal data dimension division(i.e., the...
Data mining is the process of extracting interesting patterns or knowledge from large amount of data. With the development of data mining technology, an increasing number of data can be mined out to reveal some potential information about the user, because of which privacy of the user may be violated easily. Privacy Preserving Data Mining (PPDM) is used to mine the potential valuable knowledge without...
Top-K dominating query selects k data objects and influences the highest number of objects in a dataset. This is a decision supportable query since it provides data analysts a best way for finding significant objects. This search is not only for the earlier examination of large upper bounds that leads to earlier identification of results, but also eliminates partial dominance relationship between...
In this paper, we consider data mining from large discrete trajectory data. We study closed pattern mining for the class of trajectory envelope patterns. First, we introduce the basic definition of trajectory data. Then, we present a depth-first search algorithm that finds all trajectory envelope patterns in a given database that satisfies constrants on maximum width, minimum length, and minimum frequency...
The aim of feature selection applied to a classification task is to find a minimal subset of features for being used in the classification. Some researches have focused their effort on selecting a useful set of attributes, others on selecting a relevant and not redundant set of attributes. We proposed a heuristic construction algorithm for selecting a useful and not redundant subset of features. The...
Studies shows that finding frequent sub-graphs in uncertain graphs database is an NP complete problem. Finding the frequency at which these sub-graphs occur in uncertain graph database is also computationally expensive. This paper focus on investigation of mining frequent sub-graph patterns in DBLP uncertain graph data using an approximation based method. The frequent sub-graph pattern mining problem...
Existing trajectory prediction algorithms mainly employ kinematical models to approximate real world routes and always ignore spatial and temporal distance. In order to overcome the drawbacks of existing trajectory prediction approaches, this paper proposes a novel trajectory prediction algorithm. It works as: (1) mining the interesting regions from trajectory data sets; (2) extracting the trajectory...
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands...
The pairwise mining problem is to discover pairwise objects having measures greater than the user-specified minimum threshold from a collection of objects. It is essential in a large variety of database and data-mining applications. Of late, there has been increasing interest in applying a Locality-Sensitive Hashing (LSH) scheme for pairwise mining. LSH-type methods have shown themselves to be simply...
To overpass the speed gap between processor and main memory; cache memory is used. Cache memory is having hierarchical structure, including level 1 cache (L1), level 2 cache (L2) etc. Effective page replacement algorithm will result in effectual utilization of cache. L1 is having rich temporal locality while L2 is having poor temporal locality, thus same replacement algorithms for both the levels...
The recent proliferation of graph data in a wide spectrum of applications has led to an increasing demand for advanced data analysis techniques. In view of this, many graph mining techniques, such as frequent subgraph mining and correlated subgraph mining, have been proposed. In many applications, both frequency and correlation play an important role. Thus, this paper studies a new problem of mining...
Many organizations collect large amounts of data to support their business and decision making processes. The data collected from various sources may have data quality problems in it. These kinds of issues become prominent when various databases are integrated. The integrated databases inherit the data quality problems that were present in the source database. The data in the integrated systems need...
This paper presents a novel generalized steepest ascent algorithm for selecting a subset of features. Our proposed algorithm is an improvement upon the prior steepest ascent algorithm by selecting a better starting search point and performing a more thorough search than the steepest ascent algorithm. For any given criterion function used to evaluate the effectiveness of a selected feature subsets,...
Fractal dimension is widely adopted in spatial databases and data mining, among others as a measure of dataset skewness. State-of-the-art algorithms for estimating the fractal dimension exhibit linear runtime complexity whether based on box-counting or approximation schemes. In this paper, we revisit a correlation fractal dimension estimation algorithm that redundantly rescans the dataset and, extending...
In large telecommunication network management system, substantial data containing the information of network traffic, network element status, device running situation and all other messages are continuously sent from each special network management system to the integrated network management system. This kind of data is typically the stream data. Current network management system employs traditional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.