A fast algorithm for mining frequent ordered subtrees

被引:0
作者
Hido, Shohei [1 ]
Kawano, Hiroyuki [2 ]
机构
[1] Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto
[2] Department of Information and Telecommunication Engineering, Nanzan University, Seto
关键词
Data mining; Frequent ordered subtree discovery; Right-and-left tree join; Semistructured data;
D O I
10.1002/scj.20690
中图分类号
学科分类号
摘要
In this research, the authors took up the problem of discovering frequent ordered subtree patterns in tree structured data sets, which is of note in the actively researched area of frequent partial structure mining for semistructured data. With conventional algorithms, nonredundant frequent candidate tree enumeration is performed using rightmost expansion, but a problem with that approach is the enumeration of a large number of infrequent candidate trees. Accordingly, the authors propose a right-and-left tree join in order for the efficient enumeration of frequent candidate trees, from small-sized frequent trees to larger frequent candidate trees. Furthermore, the authors constructed AMIOT as an algorithm for discovering frequent ordered subtree patterns using the right-and-left tree join for candidate tree enumeration. A performance evaluation using artificial data and XML data indicated that AMIOT was 2.5 to 5 times faster than existing algorithms. © 2007 Wiley Periodicals, Inc.
引用
收藏
页码:34 / 43
页数:9
相关论文
共 13 条
[1]  
Nijssen S., Frequent structure mining: Efficiency issues, 2nd International Workshop on Mining Graphs, Trees and Sequences (MGTS'04)
[2]  
Inokuchi A., Washio T., Motoda H., An apriori-based algorithm for mining frequent substructures from graph data, Proc 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD'00), pp. 13-23
[3]  
Inokuchi A., Washio T., Motoda H., Applying the apriori-based graph mining method to mutagenesis data analysis, Comput Aided Chem, 2, pp. 87-92, (2001)
[4]  
Asai T., Abe K., Kawasoe S., Arimura H., Sakamoto H., Arikawa S., Efficient substructure discovery from large semi-structured data, Proc 2nd SIAM International Conference on Data Mining (SDM'02), pp. 158-174
[5]  
Zaki M.J., Efficiently mining frequent trees in a forest, Proc 8th International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD'02), pp. 71-80
[6]  
Kudou H., Matsumoto Y., A boosting algorithm for classificaation of semi-structured text, Trans IPS Japan, 45, pp. 2146-2156, (2004)
[7]  
Punin J.R., Krishnamoorthy M.S., Zaki M.J., Web usage mining: Languages and algorithms, (2001)
[8]  
Agrawal R., Srikant R., Fast algorithms for mining association rules, Proc 20th International Conference on Very Large Data Bases (VLDB'94), pp. 487-499
[9]  
Roberto J., Bayardo J., Efficiently mining long patterns from databases, Proc International Conference on Management of Data (ACM SIGMOD'98), pp. 85-93
[10]  
Hido S., Kawano H., Frequent ordered subtree discovery algorithm using left-right tree join. IEICE 16th Data Engineering Workshop (DEWS2005), 6C-i9