Sequential Bayesian Phylogenetic Inference

被引:0
作者
Hoehna, Sebastian [1 ,2 ]
Hsiang, Allison Y. [1 ,2 ]
机构
[1] Ludwig Maximilians Univ Munchen, GeoBioctr LMU, Richard Wagner Str 10, D-80333 Munich, Germany
[2] Ludwig Maximilians Univ Munchen, Dept Earth & Environm Sci, Paleontol & Geobiol, Richard Wagner Str 10, D-80333 Munich, Germany
关键词
Bayesian inference; divergence time estimation; joint posterior distribution; parameter uncertainty; phylogenetics; RevBayes; ESTIMATING DIVERGENCE TIMES; SUPPORTS SPONGES; UNCERTAINTY; MODEL; CHARACTERS; EVOLUTION; SISTER; TREES;
D O I
10.1093/sysbio/syae020
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The ideal approach to Bayesian phylogenetic inference is to estimate all parameters of interest jointly in a single hierarchical model. However, this is often not feasible in practice due to the high computational cost. Instead, phylogenetic pipelines generally consist of sequential analyses, whereby a single point estimate from a given analysis is used as input for the next analysis (e.g., a single multiple sequence alignment is used to estimate a gene tree). In this framework, uncertainty is not propagated from step to step, which can lead to inaccurate or spuriously confident results. Here, we formally develop and test a sequential inference approach for Bayesian phylogenetic inference, which uses importance sampling to generate observations for the next step of an analysis pipeline from the posterior distribution produced in the previous step. Our sequential inference approach presented here not only accounts for uncertainty between analysis steps but also allows for greater flexibility in software choice (and hence model availability) and can be computationally more efficient than the traditional joint inference approach when multiple models are being tested. We show that our sequential inference approach is identical in practice to the joint inference approach only if sufficient information in the data is present (a narrow posterior distribution) and/or sufficiently many important samples are used. Conversely, we show that the common practice of using a single point estimate can be biased, for example, a single phylogeny estimate can transform an unrooted phylogeny into a time-calibrated phylogeny. We demonstrate the theory of sequential Bayesian inference using both a toy example and an empirical case study of divergence-time estimation in insects using a relaxed clock model from transcriptome data. In the empirical example, we estimate 3 posterior distributions of branch lengths from the same data (DNA character matrix with a GTR+Gamma+I substitution model, an amino acid data matrix with empirical substitution models, and an amino acid data matrix with the PhyloBayes CAT-GTR model). Finally, we apply 3 different node-calibration strategies and show that divergence time estimates are affected by both the data source and underlying substitution process to estimate branch lengths as well as the node-calibration strategies. Thus, our new sequential Bayesian phylogenetic inference provides the opportunity to efficiently test different approaches for divergence time estimation, including branch-length estimation from other software.
引用
收藏
页码:704 / 721
页数:18
相关论文
共 50 条
  • [31] Efficient Sequential Monte-Carlo Samplers for Bayesian Inference
    Thi Le Thu Nguyen
    Septier, Franois
    Peters, Gareth W.
    Delignon, Yves
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (05) : 1305 - 1319
  • [32] Bayesian sequential inference for stochastic kinetic biochemical network models
    Golightly, Andrew
    Wilkinson, Darren J.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (03) : 838 - 851
  • [33] A Sequential Constant-stress Accelerated Life Testing Scheme and Its Bayesian Inference
    Liu, Xiao
    Tang, Loon-Ching
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2009, 25 (01) : 91 - 109
  • [34] EFFICIENT BAYESIAN INFERENCE OF GENERAL GAUSSIAN MODELS ON LARGE PHYLOGENETIC TREES
    Bastide, Paul
    Ho, Lam Si Tung
    Baele, Guy
    Lemey, Philippe
    Suchard, Marc A.
    ANNALS OF APPLIED STATISTICS, 2021, 15 (02) : 971 - 997
  • [35] Bayesian phylogenetic inference via Markov chain Monte Carlo methods
    Mau, B
    Newton, MA
    Larget, B
    BIOMETRICS, 1999, 55 (01) : 1 - 12
  • [36] Simultaneous Bayesian inference of phylogeny and molecular coevolution
    Meyer, Xavier
    Dib, Linda
    Silvestro, Daniele
    Salamin, Nicolas
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (11) : 5027 - 5036
  • [37] Bayesian Inference of Clonal Expansions in a Dated Phylogeny
    Helekal, David
    Ledda, Alice
    Volz, Erik
    Wyllie, David
    Didelot, Xavier
    SYSTEMATIC BIOLOGY, 2021, : 1073 - 1087
  • [38] Phylogenetic relationships of Neogene hamsters (Mammalia, Rodentia, Cricetinae) revealed under Bayesian inference and maximum parsimony
    Dirnberger, Moritz
    Pelaez-Campomanes, Pablo
    Lopez-Antonanzas, Raquel
    PEERJ, 2024, 12
  • [39] MAST: Phylogenetic Inference with Mixtures Across Sites and Trees
    Wong, Thomas K. F.
    Cherryh, Caitlin
    Rodrigo, Allen G.
    Hahn, Matthew W.
    Minh, Bui Quang
    Lanfear, Robert
    SYSTEMATIC BIOLOGY, 2024, 73 (02) : 375 - 391
  • [40] Evaluation of the Relative Performance of the Subflattenings Method for Phylogenetic Inference
    Stevenson, Joshua
    Holland, Barbara
    Charleston, Michael
    Sumner, Jeremy
    BULLETIN OF MATHEMATICAL BIOLOGY, 2023, 85 (03)