The Asymptotic Behavior of Bootstrap Support Values in Molecular Phylogenetics

被引:9
作者
Huang, Jun [1 ,2 ]
Liu, Yuting [1 ]
Zhu, Tianqi [3 ]
Yang, Ziheng [2 ]
机构
[1] Beijing Jiaotong Univ, Dept Math, Beijing 100044, Peoples R China
[2] UCL, Dept Genet Evolut & Environm, Gower St, London WC1E 6BT, England
[3] Chinese Acad Sci, Acad Math & Syst Sci, Natl Ctr Math & Interdisciplinary Sci, Key Lab Random Complex Struct,Data Sci, Beijing 100000, Peoples R China
基金
英国生物技术与生命科学研究理事会;
关键词
Bootstrap; model selection; star-tree paradox; support value; MAXIMUM-LIKELIHOOD-ESTIMATION; POSTERIOR PROBABILITIES; EVOLUTIONARY TREES; DNA-SEQUENCES; CONFIDENCE; MODELS; SELECTION;
D O I
10.1093/sysbio/syaa100
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The phylogenetic bootstrap is the most commonly used method for assessing statistical confidence in estimated phylogenies by non-Bayesian methods such as maximum parsimony and maximum likelihood (ML). It is observed that bootstrap support tends to be high in large genomic data sets whether or not the inferred trees and clades are correct. Here, we study the asymptotic behavior of bootstrap support for the ML tree in large data sets when the competing phylogenetic trees are equally right or equally wrong. We consider phylogenetic reconstruction as a problem of statistical model selection when the compared models are nonnested and misspecified. The bootstrap is found to have qualitatively different dynamics from Bayesian inference and does not exhibit the polarized behavior of posterior model probabilities, consistent with the empirical observation that the bootstrap is more conservative than Bayesian probabilities. Nevertheless, bootstrap support similarly shows fluctuations among large data sets, with no convergence to a point value, when the compared models are equally right or equally wrong. Thus, in large data sets strong support for wrong trees or models is likely to occur. Our analysis provides a partial explanation for the high bootstrap support values for incorrect clades observed in empirical data analysis.
引用
收藏
页码:774 / 785
页数:12
相关论文
共 44 条
  • [1] [Anonymous], 2004, Bayesian Inference
  • [2] On the interpretation of bootstrap trees: Appropriate threshold of clade selection and induced gain
    Berry, V
    Gascuel, O
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (07) : 999 - 1011
  • [3] SOME ASYMPTOTIC THEORY FOR THE BOOTSTRAP
    BICKEL, PJ
    FREEDMAN, DA
    [J]. ANNALS OF STATISTICS, 1981, 9 (06) : 1196 - 1217
  • [4] Larger, unfiltered datasets are more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities in Golden-backed frogs (Anura: Ranidae; genus Hylarana)
    Chan, Kin Onn
    Hutter, Carl R.
    Wood, Perry L.
    Grismer, L. Lee
    Brown, Rafe M.
    [J]. MOLECULAR PHYLOGENETICS AND EVOLUTION, 2020, 151
  • [5] BOOTSTRAP CONSISTENCY FOR GENERAL SEMIPARAMETRIC M-ESTIMATION
    Cheng, Guang
    Huang, Jianhua Z.
    [J]. ANNALS OF STATISTICS, 2010, 38 (05) : 2884 - 2915
  • [6] DasGupta A, 2008, SPRINGER TEXTS STAT, P461
  • [7] Davidson A. C., 1997, BOOTSTRAP METHODS TH
  • [8] Dawid AP, 2011, HBK PHILOS SCI, V7, P607
  • [9] Bootstrap confidence levels for phylogenetic trees (vol 93, pg 7085, 1996)
    Efron, B
    Halloran, E
    Holmes, S
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (23) : 13429 - 13434
  • [10] 1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE
    EFRON, B
    [J]. ANNALS OF STATISTICS, 1979, 7 (01) : 1 - 26