The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this era, the web usage mining plays one of the important role in analyzing and improving performance of web applications. Initially in this paper, the concept of web usage mining has been introduced along with Improved AprioriAll algorithm which is the base algorithm for the proposed tool along with its limitation. The improved approach is discussed which integrates sequential pattern mining with...
Recent developments in the storage of system data in the Navy's data repository, LEAPS, using the FOCUS product meta-model have opened the doors to graph-theory applications in the design of Navy ship systems in the early stages of design. In this paper, we demonstrate the ability to extract graphs from ship data and present pertinent applications of such graphs including a vulnerability metric for...
In this paper, we presented a novel graph-based clustering algorithm (GC). GC contains two main steps: the first step is to create a graph and find out the key nodes as centers, the second step is to divide every data point to each center. The centers are selected from a graph view. Experimental results on 8 datasets demonstrated that GC could do better than k-means, k-medoids, Hierarchical Clustering...
Recently, every enterprise generates large volumes of high dimensional data on a regular basis. Complex data mining and analysis techniques are used to feasibly analyse this data. Feature selection aids in this by providing a reduced representation of this data while maintaining integrity. We propose a graph-based feature selection algorithm utilizing feature intercorrelation to construct a weighted...
The output of frequent pattern mining is a large amount of redundant frequent patterns, causing a hard problem in the process of data mining and knowledge discovery. Incorporating the information of edge connectivity into the gSpan algorithm, we propose an efficient algorithm for mining frequent k-edge-connected subgraphs in a given graph dataset. Exactly, when the DFS code tree goes through depth-first...
IT services delivery is a complex ecosystem that engages 100000s of system administrators in service delivery centers globally managing 1000s of IT systems on behalf of customers. Such large-scale hosting environments require a flexible identity management system to provision necessary access rights, in order to ensure compliance posture of an organization. A popular and effective access control scheme...
In many applications, it is convenient to represent data as a graph, and often these datasets will be quite large. This paper presents an architecture for analyzing massive graphs, with a focus on signal processing applications such as modeling, filtering, and signal detection. We describe the architecture, which covers the entire processing chain, from data storage to graph construction to graph...
In this paper, we will examine the problem of dimensionality reduction of massive disk-resident data sets. Graph mining has become important in recent years because of its numerous applications in community detection, social networking, and web mining. Many graph data sets are defined on massive node domains in which the number of nodes in the underlying domain is very large. As a result, it is often...
Discriminative subgraphs can be used to characterize complex graphs, construct graph classifiers and generate graph indices. The search space for discriminative subgraphs is usually prohibitively large. Most measurements of interestingness of discriminative subgraphs are neither monotonic nor antimonotonic with respect to subgraph frequencies. Therefore, branch-and-bound algorithms are unable to mine...
Some of the major challenges in current clustering applications include: some data sets are so huge that it is difficult to load the entire data sets into memory for clustering, the data sets are often distributed over different locations for various reasons, which makes it impossible to process them centrally, and when lacking prior knowledge of the unknown data sets, it is troublesome to choose...
Association rule mining is a very important research topic in the field of data mining. Discovering frequent itemsets is the key process in association rule mining. Traditional association rule algorithms adopt an iterative method, which requires very large calculations and a complicated transaction process. FAR (Feature Matrix Based Association Rules) algorithm solves this problem. However, FAR algorithm...
In many clustering applications, the data sets are high-dimensional, sparse and binary, resulting to the failure of traditional algorithms in handling these data. In this paper, we present a new clustering algorithm based on graph partition for high-dimensional data, which, by defining the feature vector of attribute-value distribution and the similarity of attribution-value distribution, and creating...
The connected component of an undirected graph plays an important part in graph theory. It is straightforward to compute the connected components of a graph in linear time using either breadth-first search or depth-first search. However when confronted with large scale data, both of the two algorithms are hard to execute. In this paper, we introduce a recently proposed community detection technique...
Design pattern is an effective way to describe software architecture. But with the increasing size and complexity of the software, it is difficult to recognize design patterns are used in software. In order to comprehend and maintain software system, a lot of detecting design pattern algorithms were proposed. In this paper, we proposed an algorithm to discovery design patterns more efficient by automatic...
A Multi-relational Bayesian Classification Algorithm with Rough Set is proposed in this paper. The concept of relational graph used to dynamic choice associative table associated with the target table, and a tuple ID propagation approach is used to solve directly the association rule mining problem with multiple database relations, and the concept of Core in Rough Set is introduced, simplify the associative...
Many of the previous studies show convincing arguments that mining frequent subgraphs is especially useful. Many hidden frequent patterns which are very interesting can not be found by mining single graph. Previous studies as Quasi-Clique have little success with the hub problem. In this paper, we introduce a new conception Correlated-Quasi-Clique and develop a novel algorithm, CoClique, to address...
This paper puts forward an outlying reduction method based on the power graph. The outlying reduction problem is due to the search of power graph. This paper utilizes the pruning strategy on the basis of the power graph expansion, which greatly reduces the storage and computation and improves the algorithm performance. This method can explain and analyze outliers in a smallest outlying subspace which...
The previous study of pattern discovery in storage systems focus on sequential pattern (SP) mining in lower level traces, but they don't scale well to the application level. For patterns in application level are mostly composed of Contiguous Item Sequential Patterns (CISP) which are much simpler than SP, so it's inefficient for the previous studies to mine CISP with clumsy SP mining algorithms. We...
To gain the competitive advantage in today's age of technology, growing data and to bear the competitive pressure, making strong decisions according to customer's need and market trend has become very important. With huge amount of data on internet, web data mining has become very significant. Web Usage helps companies to produce productive information pertaining to the future of their business function...
In data mining, SVD is a popular method that has been used for compressing high dimensional data. Binary matrix factorization (BMF) is a variant of SVD. There are two methods for binary factorization compression: the iterative heuristic and greedy algorithms. However, both of them are not perfect in applications. The iterative heuristic does not guarantee the convergence in most cases and greedy algorithms...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.