Mining of Data with Complex Structures

chapter

Front Matter

Studies in Computational Intelligence > Mining of Data with Complex Structures

chapter

Introduction

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 1-21

Large amounts of data are collected and stored by different government, industrial, commercial or scientific organizations. As the complexity and volume of the data continue to increase, the task of classifying new unseen data and extracting useful knowledge from the data is becoming practically impossible for humans to do. This makes the automatic knowledge acquisition process not just advantageous...

chapter

Tree Mining Problem

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 23-40

As mentioned in Chapter 1, an important category of complex data is tree-structured data. It occurs in a variety of different domains and applications such as Web Intelligence applications, bioinformatics, natural language processing, programming compilation, scientific knowledge management and querying, etc. (Wang et al. 1994). Mining of tree-structured data introduces significant new challenges...

chapter

Algorithm Development Issues

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 41-65

In this chapter we discuss the main issues that arise when developing tree mining algorithms. The way that an algorithm is implemented often greatly determines its efficiency. The main aspects which affect the overall performance of the algorithm are the way that the document structure is represented at the algorithm level, the way that candidate subtrees are enumerated and counted and, in the case...

chapter

Tree Model Guided Framework

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 67-86

In this chapter, we describe the main characteristics of the Tree Model Guided (TMG) Framework for frequent subtree mining. This framework has good extendibility to all of the current problems for frequent subtree mining (Hadzic 2008; Tan 2008). An algorithm is considered as extendible in the sense that minimal effort is required to adjust the general framework so that different but related problems...

chapter

TMG Framework for Mining Ordered Subtrees

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 87-138

In this chapter, we will elaborate on the overall TMG framework for mining ordered subtrees as described in Chapter 4 (Tan 2008). In an ordered tree, for each internal node, the order of its children is fixed. Such trees have found many useful applications in areas such as vision, natural language processing, molecular biology, programming compilation etc. (Wang, Zhang, Jeong & Shasha 1994). In...

chapter

TMG Framework for Mining Unordered Subtrees

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 139-174

This chapter describes the extension of the TMG framework for the mining of unordered induced/embedded subtrees. While in online tree-structured documents such as XML the information is presented in a particular order, in many applications the order among the sibling-nodes is considered unimportant or irrelevant to the task and is often not available. If one is interested in comparing different document...

chapter

Mining Distance-Constrained Embedded Subtrees

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 175-190

For certain applications, the distance between the nodes in a hierarchical structure could be considered important and two embedded subtrees with different distance relationships among the nodes need to be considered as separate entities. The embedded subtrees extracted using the traditional definition are incapable of being further distinguished based upon the node distance within that subtree. In...

chapter

Mining Maximal and Closed Frequent Subtrees

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 191-199

In general, for frequent pattern mining problems, the candidate enumeration process exhaustively enumerates all possible combinations of itemsets that are a subset of a given database. This process is known to be very expensive since, in many circumstances, the number of candidates to enumerate is quite large, and also the frequent patterns present in real-world data can be fairly long (Bayardo 1998)...

chapter

Tree Mining Applications

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 201-247

The aim of this chapter is to discuss the applications of tree mining algorithms with respect to the general knowledge analysis task, as well as some specific applications. The implications of using different tree mining parameters (i.e. subtree types, support definitions, constraints) are discussed, and illustrative scenarios are used to indicate useful application areas for different parameters...

chapter

Extension of TMG Framework for Mining Frequent Subsequences

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 249-286

Data mining strives to find frequent patterns in data. The type of data, structures and patterns to be found characterizes a task in data mining. Advancements in data collection technologies have contributed to the high volumes of sales data. Sales data typically consists of transaction timestamps and the list of items bought by customers. If one is interested in inter-transactional patterns, sales...

chapter

Graph Mining

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 287-300

The contents of the book have focused so far on the mining of data where the underlying structure is characterized by special types of graphs where cycles are not allowed, i.e. acyclic graphs or trees. The focus of this chapter is on the frequent pattern mining problem where the underlying structure of the data can be of general graph type where cycles are allowed. These kinds of representations allow...

chapter

New Research Directions

Fedja Hadzic, Henry Tan, Tharam S. Dillon

Studies in Computational Intelligence > Mining of Data with Complex Structures > 301-326

This chapter will discuss some new research directions in the frequent subtree mining field. This will be discussed from both the application and technical perspectives. Since frequent subtree mining (FSM) is a relatively new field compared with frequent itemset/sequence mining, many lessons can be learned form the more mature research in frequent itemset/sequence mining. A drawback of frequent pattern...

INFONA - science communication portal

Mining of Data with Complex Structures

Front Matter

Introduction

Tree Mining Problem

Algorithm Development Issues

Tree Model Guided Framework

TMG Framework for Mining Ordered Subtrees

TMG Framework for Mining Unordered Subtrees

Mining Distance-Constrained Embedded Subtrees

Mining Maximal and Closed Frequent Subtrees

Tree Mining Applications

Extension of TMG Framework for Mining Frequent Subsequences

Graph Mining

New Research Directions

Filter options

Publication date

Publication language

INFONA - science communication portal

Mining of Data with Complex Structures $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication language

Reporting an error / abuse

Sending the report failed

Accessibility options

Mining of Data with Complex Structures