Bayesian model selection via mean-field variational approximation

被引:3
作者
Zhang, Yangfan [1 ]
Yang, Yun [1 ,2 ]
机构
[1] Univ Illinois, Dept Stat, Champaign, IL USA
[2] Univ Illinois, Dept Stat, 605 E Springfield Ave, Champaign, IL 61820 USA
基金
美国国家科学基金会;
关键词
Bayesian inference; coordinate ascent; mean-field inference; oracle inequality; CONVERGENCE-RATES; INFORMATION CRITERION; ASYMPTOTIC NORMALITY; MAXIMUM-LIKELIHOOD; INFERENCE; SINGULARITY;
D O I
10.1093/jrsssb/qkad164
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article considers Bayesian model selection via mean-field (MF) variational approximation. Towards this goal, we study the non-asymptotic properties of MF inference that allows latent variables and model misspecification. Concretely, we show a Bernstein-von Mises (BvM) theorem for the variational distribution from MF under possible model misspecification, which implies the distributional convergence of MF variational approximation to a normal distribution centring at the maximal likelihood estimator. Motivated by the BvM theorem, we propose a model selection criterion using the evidence lower bound (ELBO), and demonstrate that the model selected by ELBO tends to asymptotically agree with the one selected by the commonly used Bayesian information criterion (BIC) as the sample size tends to infinity. Compared to BIC, ELBO tends to incur smaller approximation error to the log-marginal likelihood (a.k.a. model evidence) due to a better dimension dependence and full incorporation of the prior information. Moreover, we show the geometric convergence of the coordinate ascent variational inference algorithm, which provides a practical guidance on how many iterations one typically needs to run when approximating the ELBO. These findings demonstrate that variational inference is capable of providing a computationally efficient alternative to conventional approaches in tasks beyond obtaining point estimates.
引用
收藏
页码:742 / 770
页数:29
相关论文
共 54 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   CONCENTRATION OF TEMPERED POSTERIORS AND OF THEIR VARIATIONAL APPROXIMATIONS [J].
Alquier, Pierre ;
Ridgway, James .
ANNALS OF STATISTICS, 2020, 48 (03) :1475-1497
[3]  
Alquier P, 2016, J MACH LEARN RES, V17
[4]  
Bhattacharya A, 2020, Arxiv, DOI arXiv:2008.04537
[5]   BAYESIAN FRACTIONAL POSTERIORS [J].
Bhattacharya, Anirban ;
Pati, Debdeep ;
Yang, Yun .
ANNALS OF STATISTICS, 2019, 47 (01) :39-66
[6]   ASYMPTOTIC NORMALITY OF MAXIMUM LIKELIHOOD AND ITS VARIATIONAL APPROXIMATION FOR STOCHASTIC BLOCKMODELS [J].
Bickel, Peter ;
Choi, David ;
Chang, Xiangyu ;
Zhang, Hai .
ANNALS OF STATISTICS, 2013, 41 (04) :1922-1943
[7]  
Bishop C., 2006, Springer google schola
[8]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[9]   OPTIMAL RATE OF CONVERGENCE FOR FINITE MIXTURE-MODELS [J].
CHEN, JH .
ANNALS OF STATISTICS, 1995, 23 (01) :221-233
[10]  
Cherief-Abdellatif B.E., 2019, PMLR, P11