Canonical forms for labelled trees and their applications in frequent subtree mining

被引:43
作者
Chi, Y [1 ]
Yang, YR [1 ]
Muntz, RR [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
canonical form; frequent subtree; labelled free tree; labelled rooted unordered tree; tree isomorphism;
D O I
10.1007/s10115-004-0180-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we first present two canonical forms for labelled rooted unordered trees-the breadth-first canonical form (BFCF) and the depth-first canonical form (DFCF). Then the canonical forms are applied to the frequent subtree mining problem. Based on the BFCF, we develop a vertical mining algorithm, RootedTreeMiner, to discover all frequently occurring subtrees in a database of labelled rooted unordered trees. The RootedTreeMiner algorithm uses an enumeration tree to enumerate all (frequent) labelled rooted unordered subtrees. Next, we extend the definition of the DFCF to labelled free trees and present an Apriori-like algorithm, FreeTreeMiner, to discover all frequently occurring subtrees in a database of labelled free trees. Finally, we study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from real applications.
引用
收藏
页码:203 / 234
页数:32
相关论文
共 32 条
[1]  
Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
[2]   A tree projection algorithm for generation of frequent item sets [J].
Agarwal, RC ;
Aggarwal, CC ;
Prasad, VVV .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (03) :350-371
[3]  
Aho A.V., 1974, The Design and Analysis of Computer Algorithms
[4]  
[Anonymous], 2002, ALGORITHMS TREES GRA
[5]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[6]  
[Anonymous], 2000, GRAPHS APPL INTRO AP
[7]  
ASAI T, 2002, 2 SIAM INT C DAT MIN
[8]  
ASAI T, 2003, 6 INT C DISC SCI
[9]  
BAYARDO R, 1998, P ACM SIGMOD
[10]  
BUSS SR, 1997, LECT NOTES COMPUTER, V1289, P18