Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods

被引:77
作者
Ogilvie, Huw A. [1 ]
Heled, Joseph [2 ,3 ]
Xie, Dong [2 ,3 ]
Drummond, Alexei J. [2 ,3 ]
机构
[1] Australian Natl Univ, Res Sch Biol, Evolut Ecol & Genet, Canberra, ACT, Australia
[2] Univ Auckland, Dept Comp Sci, Auckland 1, New Zealand
[3] Univ Auckland, Allan Wilson Ctr Mol Ecol & Evolut, Auckland 1, New Zealand
基金
澳大利亚研究理事会;
关键词
Bayesian phylogenetics; Concatenation; Gene tree; Multispecies coalescent; Phylogenomics; Species tree; Supermatrix; SPECIES TREE ESTIMATION; GENE TREES; PHYLOGENETIC ANALYSIS; MAXIMUM-LIKELIHOOD; COALESCENT; HYBRIDIZATION; DISCOVERY; INFERENCE; RECONSTRUCTION; CONCATENATION;
D O I
10.1093/sysbio/syv118
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behavior of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow-up simulations over awide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.
引用
收藏
页码:381 / 396
页数:16
相关论文
共 67 条
[1]   Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers [J].
Baird, Nathan A. ;
Etter, Paul D. ;
Atwood, Tressa S. ;
Currey, Mark C. ;
Shiver, Anthony L. ;
Lewis, Zachary A. ;
Selker, Eric U. ;
Cresko, William A. ;
Johnson, Eric A. .
PLOS ONE, 2008, 3 (10)
[2]   Naive binning improves phylogenomic analyses [J].
Bayzid, Md Shamsuzzoha ;
Warnow, Tandy .
BIOINFORMATICS, 2013, 29 (18) :2277-2284
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   A comprehensive multilocus phylogeny of the Neotropical cotingas (Cotingidae, Aves) with a comparative evolutionary analysis of breeding system and plumage dimorphism and a revised phylogenetic classification [J].
Berv, Jacob S. ;
Prum, Richard O. .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2014, 81 :120-136
[5]   Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales [J].
Bi, Ke ;
Vanderpool, Dan ;
Singhal, Sonal ;
Linderoth, Tyler ;
Moritz, Craig ;
Good, Jeffrey M. .
BMC GENOMICS, 2012, 13
[6]   BEAST 2: A Software Platform for Bayesian Evolutionary Analysis [J].
Bouckaert, Remco ;
Heled, Joseph ;
Kuehnert, Denise ;
Vaughan, Tim ;
Wu, Chieh-Hsi ;
Xie, Dong ;
Suchard, Marc A. ;
Rambaut, Andrew ;
Drummond, Alexei J. .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (04)
[7]   Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis [J].
Bryant, David ;
Bouckaert, Remco ;
Felsenstein, Joseph ;
Rosenberg, Noah A. ;
RoyChoudhury, Arindam .
MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (08) :1917-1932
[8]   Evolutionary history and the effect of biodiversity on plant productivity [J].
Cadotte, Marc W. ;
Cardinale, Bradley J. ;
Oakley, Todd H. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (44) :17012-17017
[9]   Accuracy and Precision of Species Trees: Effects of Locus, Individual, and Base Pair Sampling on Inference of Species Trees in Lizards of the Liolaemus darwinii Group (Squamata, Liolaemidae) [J].
Camargo, Arley ;
Avila, Luciano J. ;
Morando, Mariana ;
Sites, Jack W., Jr. .
SYSTEMATIC BIOLOGY, 2012, 61 (02) :272-288
[10]   Quartet Inference from SNP Data Under the Coalescent Model [J].
Chifman, Julia ;
Kubatko, Laura .
BIOINFORMATICS, 2014, 30 (23) :3317-3324