From gene trees to species trees

被引：113

作者：

Ma, B

Li, M

Zhang, LX

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

[2] Univ Waterloo, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada

[3] Kent Ridge Digital Labs, Bioinformat Ctr, Singapore 119613, Singapore

来源：

SIAM JOURNAL ON COMPUTING | 2000年 / 30卷 / 03期

关键词：

gene trees; species trees; NP-hardness; algorithms;

D O I：

10.1137/S0097539798343362

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper studies various algorithmic issues in reconstructing a species tree from gene trees under the duplication and the mutation cost model. This is a fundamental problem in computational molecular biology. Our main results are as follows. 1. A linear time algorithm is presented for computing all the losses in duplications associated with the least common ancestor mapping from a gene tree to a species tree. This answers a problem raised recently by Eulenstein, Mirkin, and Vingron [J. Comput. Bio., 5 ( 1998), pp. 135-148]. 2. The complexity of finding an optimal species tree from gene trees is studied. The problem is proved to be NP-hard for the duplication cost and for the mutation cost. Further, the concept of reconciled trees was introduced by Goodman et al. and formalized by Page for visualizing the relationship between gene and species trees. We show that constructing an optimal reconciled tree for gene trees is also NP-hard. Finally, we consider a general reconstruction problem and show it to be NP-hard even for the well-known nearest neighbor interchange distance. 3. A new and efficiently computable metric is defined based on the duplication cost. We show that the problem of finding an optimal species tree from gene trees is NP-hard under this new metric but it can be approximated within factor 2 in polynomial time. Using this approximation result, we propose a heuristic method for finding a species tree from gene trees with uniquely labeled leaves under the duplication cost. Our experimental tests demonstrate that when the number of species is larger than 15 and gene trees are close to each other, our heuristic method is significantly better than the existing program in Page's GeneTree 1.0 that starts the search from a random tree.

引用

页码：729 / 752

页数：24

共 34 条

[1]

[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness

[2]

Cormen T. H., 1990, INTRO ALGORITHMS

[3] Duplication-based measures of difference between gene and species trees [J].