Stochastic Variational Inference for Bayesian Phylogenetics: A Case of CAT Model

被引:19
作者
Dang, Tung [1 ]
Kishino, Hirohisa [1 ]
机构
[1] Univ Tokyo, Dept Agr & Environm Biol, Tokyo, Japan
基金
日本学术振兴会;
关键词
variational inference; optimization; Bayesian mixture model; phylogenetics; EVOLUTIONARY TREES; MIXTURE MODEL; INFORMATION; SEQUENCES; MUTATIONS;
D O I
10.1093/molbev/msz020
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The pattern of molecular evolution varies among gene sites and genes in a genome. By taking into account the complex heterogeneity of evolutionary processes among sites in a genome, Bayesian infinite mixture models of genomic evolution enable robust phylogenetic inference. With large modern data sets, however, the computational burden of Markov chain Monte Carlo sampling techniques becomes prohibitive. Here, we have developed a variational Bayesian procedure to speed up the widely used PhyloBayes MPI program, which deals with the heterogeneity of amino acid profiles. Rather than sampling fromthe posterior distribution, the procedure approximates the (unknown) posterior distribution using a manageable distribution called the variational distribution. The parameters in the variational distribution are estimated by minimizing Kullback-Leibler divergence. To examine performance, we analyzed three empirical data sets consisting of mitochondrial, plastid-encoded, and nuclear proteins. Our variational method accurately approximated the Bayesian inference of phylogenetic tree, mixture proportions, and the amino acid propensity of each component of the mixture while using orders of magnitude less computational time.
引用
收藏
页码:825 / 833
页数:9
相关论文
共 35 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]  
BISHOP C. M., 2006, Pattern recognition and machine learning, DOI [DOI 10.1117/1.2819119, 10.1007/978-0-387-45528-0]
[4]   Variational Inference for Dirichlet Process Mixtures [J].
Blei, David M. ;
Jordan, Michael I. .
BAYESIAN ANALYSIS, 2006, 1 (01) :121-143
[5]   Variational Inference for Large-Scale Models of Discrete Choice [J].
Braun, Michael ;
McAuliffe, Jon .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) :324-335
[6]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[7]   BAYESIAN ANALYSIS OF SOME NONPARAMETRIC PROBLEMS [J].
FERGUSON, TS .
ANNALS OF STATISTICS, 1973, 1 (02) :209-230
[8]   Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses [J].
Goldman, N ;
Thorne, JL ;
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 263 (02) :196-208
[9]   Scaling probabilistic models of genetic variation to millions of humans [J].
Gopalan, Prem ;
Hao, Wei ;
Blei, David M. ;
Storey, John D. .
NATURE GENETICS, 2016, 48 (12) :1587-1590
[10]   Efficient discovery of overlapping communities in massive networks [J].
Gopalan, Prem K. ;
Blei, David M. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (36) :14534-14539