Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting

被引:202
作者
Mirarab, Siavash [1 ]
Bayzid, Md Shamsuzzoha [1 ]
Warnow, Tandy [1 ,2 ,3 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA
[3] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
concatenation; consensus methods; gene tree discordance; incomplete lineage sorting; MP-EST; MRL; MRP; multilocus bootstrapping; species tree estimation; supertree methods; GENE TREES; PHYLOGENY INFERENCE; MAXIMUM-LIKELIHOOD; SISTER GROUP; CHOICE; ALGORITHMS; SIMULATION; TURTLES; TIMES;
D O I
10.1093/sysbio/syu063
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.
引用
收藏
页码:366 / 380
页数:15
相关论文
共 62 条
[1]   Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent [J].
Allman, Elizabeth S. ;
Degnan, James H. ;
Rhodes, John A. .
JOURNAL OF MATHEMATICAL BIOLOGY, 2011, 62 (06) :833-862
[2]  
[Anonymous], 2013, Journal of Phylogenetics and Evolutionary Biology
[3]   CONSTRUCTING CONFIDENCE SETS USING RANK STATISTICS [J].
BAUER, DF .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1972, 67 (339) :687-690
[4]   Naive binning improves phylogenomic analyses [J].
Bayzid, Md Shamsuzzoha ;
Warnow, Tandy .
BIOINFORMATICS, 2013, 29 (18) :2277-2284
[5]   Estimating Optimal Species Trees from Incomplete Gene Trees Under Deep Coalescence [J].
Bayzid, Md Shamsuzzoha ;
Warnow, Tandy .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) :591-605
[6]   Genome-scale coestimation of species and gene trees [J].
Boussau, Bastien ;
Szoellosi, Gergely J. ;
Duret, Laurent ;
Gouy, Manolo ;
Tannier, Eric ;
Daubin, Vincent .
GENOME RESEARCH, 2013, 23 (02) :323-330
[7]   Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards [J].
Brandley, MC ;
Schmitz, A ;
Reeder, TW .
SYSTEMATIC BIOLOGY, 2005, 54 (03) :373-390
[8]   Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (Archosauria) [J].
Chiari, Ylenia ;
Cahais, Vincent ;
Galtier, Nicolas ;
Delsuc, Frederic .
BMC BIOLOGY, 2012, 10
[9]   Robustness to Divergence Time Underestimation When Inferring Species Trees from Estimated Gene Trees [J].
DeGiorgio, Michael ;
Degnan, James H. .
SYSTEMATIC BIOLOGY, 2014, 63 (01) :66-82
[10]   Fast and Consistent Estimation of Species Trees Using Supermatrix Rooted Triples [J].
DeGiorgio, Michael ;
Degnan, James H. .
MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (03) :552-569