Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution

被引:91
作者
Baele, Guy [1 ]
Lemey, Philippe [1 ]
Vansteelandt, Stijn [2 ]
机构
[1] Katholieke Univ Leuven, Rega Inst, Dept Microbiol & Immunol, B-3000 Louvain, Belgium
[2] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium
基金
欧洲研究理事会;
关键词
SUBSTITUTION PATTERNS; NORMALIZING CONSTANTS; PHYLOGENETIC MODELS; INFERENCE;
D O I
10.1186/1471-2105-14-85
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results: We here assess the original 'model-switch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions: We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.
引用
收藏
页数:18
相关论文
共 37 条
[1]  
[Anonymous], 2004, Inferring phylogenies
[2]   A Model-Based Approach to Study Nearest-Neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences [J].
Baele, Guy ;
Van de Peer, Yves ;
Vansteelandt, Stijn .
SYSTEMATIC BIOLOGY, 2008, 57 (05) :675-692
[3]   Accurate Model Selection of Relaxed Molecular Clocks in Bayesian Phylogenetics [J].
Baele, Guy ;
Li, Wai Lok Sibon ;
Drummond, Alexei J. ;
Suchard, Marc A. ;
Lemey, Philippe .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (02) :239-243
[4]   Improving the Accuracy of Demographic and Molecular Clock Model Comparison While Accommodating Phylogenetic Uncertainty [J].
Baele, Guy ;
Lemey, Philippe ;
Bedford, Trevor ;
Rambaut, Andrew ;
Suchard, Marc A. ;
Alekseyenko, Alexander V. .
MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (09) :2157-2167
[5]   Context-Dependent Evolutionary Models for Non-Coding Sequences: An Overview of Several Decades of Research and an Analysis of Laurasiatheria and Primate Evolution [J].
Baele, Guy .
EVOLUTIONARY BIOLOGY, 2012, 39 (01) :61-82
[6]   Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences [J].
Baele, Guy ;
Van de Peer, Yves ;
Vansteelandt, Stijn .
BMC EVOLUTIONARY BIOLOGY, 2010, 10
[7]   Using Non-Reversible Context-Dependent Evolutionary Models to Study Substitution Patterns in Primate Non-Coding Sequences [J].
Baele, Guy ;
Van de Peer, Yves ;
Vansteelandt, Stijn .
JOURNAL OF MOLECULAR EVOLUTION, 2010, 71 (01) :34-50
[8]   Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences [J].
Baele, Guy ;
Van de Peer, Yves ;
Vansteelandt, Stijn .
BMC EVOLUTIONARY BIOLOGY, 2009, 9
[9]  
Chen M.-H., 2000, Monte Carlo Methods in Bayesian Computation
[10]   Computing Bayes factors by combining simulation and asymptotic approximations [J].
DiCiccio, TJ ;
Kass, RE ;
Raftery, A ;
Wasserman, L .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (439) :903-915