Approximation and parameterized algorithms for common subtrees and edit distance between unordered trees

被引:20
作者
Akutsu, Tatsuya [1 ]
Fukagawa, Daiji [2 ]
Halldorsson, Magnus M. [3 ]
Takasu, Atsuhiro [4 ]
Tanaka, Keisuke [5 ]
机构
[1] Kyoto Univ, Bioinformat Ctr, Inst Chem Res, Kyoto 6110011, Japan
[2] Doshisha Univ, Fac Culture & Informat Sci, Kyoto 6100394, Japan
[3] Reykjavik Univ, ICE TCS, Sch Comp Sci, IS-101 Reykjavik, Iceland
[4] Natl Inst Informat, Tokyo 1018430, Japan
[5] Tokyo Inst Technol, Dept Math & Comp Sci, Tokyo 152, Japan
关键词
Tree edit distance; Approximation algorithms; Parameterized algorithms; Dynamic programming; Unordered trees;
D O I
10.1016/j.tcs.2012.11.017
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given two rooted, labeled, unordered trees, the common subtree problem is to find a bijective matching between subsets of nodes of the trees of maximum cardinality which preserves labels and ancestry relationship. The tree edit distance problem is to determine the least cost sequence of insertions, deletions and substitutions that converts a tree into another given tree. Both problems are known to be hard to approximate within some constant factor in general. We tackle these problems from two perspectives: giving exact algorithms, either for special cases or in terms of some parameters; and approximation algorithms and hardness of approximation. We present a parameterized algorithm in terms of the number of branching nodes that solves both problems and yields polynomial algorithms for several special classes of trees. This is complemented with a tighter APX-hardness proof that holds when the trees are of height one and two, respectively. Furthermore, we present the first approximation algorithms for both problems. In particular, for the common subtree problem for t trees, we present an algorithm achieving at log(2) (b(OPT) + 1) ratio, where b(OPT) is the number of branching nodes in the optimal solution. We also present constant factor approximation algorithms for both problems in the case of bounded height trees. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 22
页数:13
相关论文
共 27 条
[1]  
Ahuja R., 1993, NETWORK FLOWS THEORY
[2]  
Akutsu Tatsuya, 2012, Combinatorial Pattern Matching. Proceedings 23rd Annual Symposium, CPM 2012, P360, DOI 10.1007/978-3-642-31265-6_29
[3]   On the approximation of largest common subtrees and largest common point sets [J].
Akutsu, T ;
Halldórsson, MM .
THEORETICAL COMPUTER SCIENCE, 2000, 233 (1-2) :33-50
[4]   Exact algorithms for computing the tree edit distance between unordered trees [J].
Akutsu, Tatsuya ;
Fukagawa, Daiji ;
Takasu, Atsuhiro ;
Tamura, Takeyuki .
THEORETICAL COMPUTER SCIENCE, 2011, 412 (4-5) :352-364
[5]   Approximating Tree Edit Distance through String Edit Distance [J].
Akutsu, Tatsuya ;
Fukagawa, Daiji ;
Takasu, Atsuhiro .
ALGORITHMICA, 2010, 57 (02) :325-348
[6]   Improved approximation of the largest common subtree of two unordered trees of bounded height [J].
Akutsu, Tatsuya ;
Fukagawa, Daiji ;
Takasu, Atsuhiro .
INFORMATION PROCESSING LETTERS, 2008, 109 (02) :165-170
[7]  
[Anonymous], 2006, IEEE Data Eng. Bull
[8]   KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains [J].
Aoki, KF ;
Yamaguchi, A ;
Ueda, N ;
Akutsu, T ;
Mamitsuka, H ;
Goto, S ;
Kanehisa, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W267-W272
[9]   Scheduling split intervals [J].
Bar-Yehuda, R. ;
Halldorsson, M. M. ;
Naor, J. ;
Shachnai, H. ;
Shapira, I. .
SIAM JOURNAL ON COMPUTING, 2006, 36 (01) :1-15
[10]   A survey on tree edit distance and related problems [J].
Bille, P .
THEORETICAL COMPUTER SCIENCE, 2005, 337 (1-3) :217-239