The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Large amounts of data are collected and stored by different government, industrial, commercial or scientific organizations. As the complexity and volume of the data continue to increase, the task of classifying new unseen data and extracting useful knowledge from the data is becoming practically impossible for humans to do. This makes the automatic knowledge acquisition process not just advantageous...
As mentioned in Chapter 1, an important category of complex data is tree-structured data. It occurs in a variety of different domains and applications such as Web Intelligence applications, bioinformatics, natural language processing, programming compilation, scientific knowledge management and querying, etc. (Wang et al. 1994). Mining of tree-structured data introduces significant new challenges...
In this chapter we discuss the main issues that arise when developing tree mining algorithms. The way that an algorithm is implemented often greatly determines its efficiency. The main aspects which affect the overall performance of the algorithm are the way that the document structure is represented at the algorithm level, the way that candidate subtrees are enumerated and counted and, in the case...
In this chapter, we describe the main characteristics of the Tree Model Guided (TMG) Framework for frequent subtree mining. This framework has good extendibility to all of the current problems for frequent subtree mining (Hadzic 2008; Tan 2008). An algorithm is considered as extendible in the sense that minimal effort is required to adjust the general framework so that different but related problems...
In this chapter, we will elaborate on the overall TMG framework for mining ordered subtrees as described in Chapter 4 (Tan 2008). In an ordered tree, for each internal node, the order of its children is fixed. Such trees have found many useful applications in areas such as vision, natural language processing, molecular biology, programming compilation etc. (Wang, Zhang, Jeong & Shasha 1994). In...
This chapter describes the extension of the TMG framework for the mining of unordered induced/embedded subtrees. While in online tree-structured documents such as XML the information is presented in a particular order, in many applications the order among the sibling-nodes is considered unimportant or irrelevant to the task and is often not available. If one is interested in comparing different document...
For certain applications, the distance between the nodes in a hierarchical structure could be considered important and two embedded subtrees with different distance relationships among the nodes need to be considered as separate entities. The embedded subtrees extracted using the traditional definition are incapable of being further distinguished based upon the node distance within that subtree. In...
In general, for frequent pattern mining problems, the candidate enumeration process exhaustively enumerates all possible combinations of itemsets that are a subset of a given database. This process is known to be very expensive since, in many circumstances, the number of candidates to enumerate is quite large, and also the frequent patterns present in real-world data can be fairly long (Bayardo 1998)...
The aim of this chapter is to discuss the applications of tree mining algorithms with respect to the general knowledge analysis task, as well as some specific applications. The implications of using different tree mining parameters (i.e. subtree types, support definitions, constraints) are discussed, and illustrative scenarios are used to indicate useful application areas for different parameters...
Data mining strives to find frequent patterns in data. The type of data, structures and patterns to be found characterizes a task in data mining. Advancements in data collection technologies have contributed to the high volumes of sales data. Sales data typically consists of transaction timestamps and the list of items bought by customers. If one is interested in inter-transactional patterns, sales...
The contents of the book have focused so far on the mining of data where the underlying structure is characterized by special types of graphs where cycles are not allowed, i.e. acyclic graphs or trees. The focus of this chapter is on the frequent pattern mining problem where the underlying structure of the data can be of general graph type where cycles are allowed. These kinds of representations allow...
This chapter will discuss some new research directions in the frequent subtree mining field. This will be discussed from both the application and technical perspectives. Since frequent subtree mining (FSM) is a relatively new field compared with frequent itemset/sequence mining, many lessons can be learned form the more mature research in frequent itemset/sequence mining. A drawback of frequent pattern...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.