Computing phylogenetic roots with bounded degrees and errors

被引:41
作者
Chen, ZZ [1 ]
Jiang, T
Lin, GH
机构
[1] Tokyo Denki Univ, Dept Math Sci, Hatoyama, Saitama 3500394, Japan
[2] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA
[3] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
[4] McMaster Univ, Hamilton, ON L8S 4L8, Canada
[5] Univ Waterloo, Waterloo, ON N2L 3G1, Canada
关键词
phylogeny; phylogenetic root; computational biology; efficient algorithm; NP-hard;
D O I
10.1137/S0097539701389154
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given a set of species and their similarity data, an important problem in evolutionary biology is how to reconstruct a phylogeny (also called evolutionary tree) so that species are close in the phylogeny if and only if they have high similarity. Assume that the similarity data are represented as a graph G = ( V, E), where each vertex represents a species and two vertices are adjacent if they represent species of high similarity. The phylogeny reconstruction problem can then be abstracted as the problem of finding a (phylogenetic) tree T from the given graph G such that ( 1) T has no degree-2 internal nodes, (2) the external nodes (i.e., leaves) of T are exactly the elements of V, and (3) (u, v) is an element of if and only if d(T) (u, v) less than or equal to k for some fixed threshold k, where d(T) (u, v) denotes the distance between u and v in tree T. This is called the phylogenetic kth root problem (PRk), and such a tree T, if it exists, is called a phylogenetic kth root of graph G. The computational complexity of PRk is open, except for k less than or equal to4. In this paper, we investigate PRk under a natural restriction that the maximum degree of the phylogenetic root is bounded from above by a constant. Our main contribution is a linear-time algorithm that determines if G has such a phylogenetic kth root, and if so, demonstrates one. On the other hand, because in practice the collected similarity data are usually not perfect and may contain errors, we propose to study a generalized version of PRk where the output phylogeny is required only to be an approximate root of the input graph. We show that this and other related problems are computationally intractable.
引用
收藏
页码:864 / 879
页数:16
相关论文
共 12 条
  • [1] [Anonymous], MOL SYSTEMATICS
  • [2] [Anonymous], 1994, SPRINGER LECT NOTES
  • [3] BODLAENDER HL, 1989, LECT NOTES COMPUT SC, V344, P1
  • [4] BRANDSTADT A, 1999, SIAM MONOGR DISCRETE, V3
  • [5] CHEN ZZ, 2003, TR0306 U ALB DEP COM
  • [6] JIANG T, 2000, UNPUB CLOSET TREE KT
  • [7] Tree powers
    Kearney, PE
    Corneil, DG
    [J]. JOURNAL OF ALGORITHMS, 1998, 29 (01) : 111 - 131
  • [8] NP-HARD PROBLEMS IN HIERARCHICAL-TREE CLUSTERING
    KRIVANEK, M
    MORAVEK, J
    [J]. ACTA INFORMATICA, 1986, 23 (03) : 311 - 323
  • [9] Lin GH, 2001, LECT NOTES COMPUT SC, V1969, P539
  • [10] ALGORITHMS FOR SQUARE ROOTS OF GRAPHS
    LIN, YL
    SKIENA, SS
    [J]. SIAM JOURNAL ON DISCRETE MATHEMATICS, 1995, 8 (01) : 99 - 118