Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics

被引:68
作者
Kolaczkowski, Bryan [1 ]
Thornton, Joseph W. [1 ,2 ]
机构
[1] Univ Oregon, Ctr Ecol & Evolutionary Biol, Eugene, OR 97403 USA
[2] Univ Oregon, Howard Hughes Med Inst, Eugene, OR 97403 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
MAXIMUM-LIKELIHOOD-ESTIMATION; SUBSTITUTION RATES VARY; POSTERIOR PROBABILITIES; PARSIMONY; BOOTSTRAP; INFERENCE; MODEL; EVOLUTION; SUPPORT; TREES;
D O I
10.1371/journal.pone.0007891
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias-which is apparent under both controlled simulation conditions and in analyses of empirical sequence data-also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages-that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.
引用
收藏
页数:12
相关论文
共 58 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence [J].
Alfaro, ME ;
Zoller, S ;
Lutzoni, F .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (02) :255-266
[3]   The posterior and the prior in Bayesian phylogenetics [J].
Alfaro, Michael E. ;
Holder, Mark T. .
ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS, 2006, 37 :19-42
[4]   Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA [J].
Anderson, FE ;
Swofford, DL .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2004, 33 (02) :440-451
[5]   Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative [J].
Anisimova, Maria ;
Gascuel, Olivier .
SYSTEMATIC BIOLOGY, 2006, 55 (04) :539-552
[6]   A review of long-branch attraction [J].
Bergsten, J .
CLADISTICS, 2005, 21 (02) :163-193
[7]   An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics [J].
Brinkmann, H ;
Van der Giezen, M ;
Zhou, Y ;
De Raucourt, GP ;
Philippe, H .
SYSTEMATIC BIOLOGY, 2005, 54 (05) :743-757
[8]   Topological bias and inconsistency of maximum likelihood using wrong models [J].
Bruno, WJ ;
Halpern, AL .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (04) :564-566
[9]  
CARLIN B. P., 2000, C&H TEXT STAT SCI
[10]   Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters [J].
Chang, JT .
MATHEMATICAL BIOSCIENCES, 1996, 134 (02) :189-215