Estimating Optimal Species Trees from Incomplete Gene Trees Under Deep Coalescence

被引:25
作者
Bayzid, Md Shamsuzzoha [1 ]
Warnow, Tandy [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
algorithms; PHYLOGENETIC INFERENCE; SOFTWARE PACKAGE; LOGS SUFFICE; ALIGNMENT; ACCURACY; ALGORITHMS; SUPERTREE; BOUNDS; BUILD; POY;
D O I
10.1089/cmb.2012.0037
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The estimation of species trees typically involves the estimation of trees and alignments on many different genes, so that the species tree can be based on many different parts of the genome. This kind of phylogenomic approach to species tree estimation has the potential to produce more accurate species tree estimates, especially when gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss, and horizontal gene transfer. Because ILS (also called "deep coalescence'') is a frequent problem in systematics, many methods have been developed to estimate species trees from gene trees or alignments that specifically take ILS into consideration. In this paper we consider the problem of estimating species trees from gene trees and alignments for the general case where the gene trees and alignments can be incomplete, which means that not all the genes contain sequences for all the species. We formalize optimization problems for this context and prove theoretical results for these problems. We also present the results of a simulation study evaluating existing methods for estimating species trees from incomplete gene trees. Our simulation study shows that *BEAST, a statistical method for estimating species trees from gene sequence alignments, produces by far the most accurate species trees. However, *BEAST can only be run on small datasets. The second most accurate method, MRP (a standard supertree method), can analyze very large datasets and produces very good trees, making MRP a potentially acceptable alternative to *BEAST for large datasets.
引用
收藏
页码:591 / 605
页数:15
相关论文
共 62 条
[1]  
[Anonymous], 2006, GENETIC ALGORITHM AP
[3]  
Bryant D., 2003, BioConsensus, P163
[4]  
Chaudhary R, 2010, BMC BIOINFORMATICS, V11, DOI 10.1186/1471-2105-11-574
[5]   Comparing Two Bayesian Methods for Gene Tree/Species Tree Reconstruction: Simulations with Incomplete Lineage Sorting and Horizontal Gene Transfer [J].
Chung, Yujin ;
Ane, Cecile .
SYSTEMATIC BIOLOGY, 2011, 60 (03) :261-275
[6]   Evolutionary trees can be learned in polynomial time in the two-state general Markov model [J].
Cryan, M ;
Goldberg, LA ;
Goldberg, PW .
39TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1998, :436-445
[7]  
Csürös M, 1999, PROCEEDINGS OF THE TENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P261
[8]   PHYLOGENIES WITHOUT BRANCH BOUNDS: CONTRACTING THE SHORT, PRUNING THE DEEP [J].
Daskalakis, Constantinos ;
Mossel, Elchanan ;
Roch, Sebastien .
SIAM JOURNAL ON DISCRETE MATHEMATICS, 2011, 25 (02) :872-893
[9]   Properties of Consensus Methods for Inferring Species Trees from Gene Trees [J].
Degnan, James H. ;
DeGiorgio, Michael ;
Bryant, David ;
Rosenberg, Noah A. .
SYSTEMATIC BIOLOGY, 2009, 58 (01) :35-54
[10]   Gene tree discordance, phylogenetic inference and the multispecies coalescent [J].
Degnan, James H. ;
Rosenberg, Noah A. .
TRENDS IN ECOLOGY & EVOLUTION, 2009, 24 (06) :332-340