Comparing methods for statistical inference with model uncertainty

被引:20
作者
Porwal, Anupreet [1 ]
Raftery, Adrian E. [1 ,2 ]
机构
[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Sociol, Seattle, WA 98195 USA
关键词
Bayesian model averaging; interval estimation; LASSO; model selection; parameter estimation; VARIABLE SELECTION; REGRESSION SHRINKAGE; PRIORS; REGULARIZATION; PERFORMANCE; HORSESHOE; LASSO;
D O I
10.1073/pnas.2120737119
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Probability models are used for many statistical tasks, notably parameter estimation, interval estimation, inference about model parameters, point prediction, and interval prediction. Thus, choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process. Here we focus on one such choice, that of variables to include in a linear regression model. Many methods have been proposed, including Bayesian and penalized likelihood methods, and it is unclear which one to use. We compared 21 of the most popular methods by carrying out an extensive set of simulation studies based closely on real datasets that span a range of situations encountered in practical data analysis. Three adaptive Bayesian model averaging (BMA) methods performed best across all statistical tasks. These used adaptive versions of Zellner's g-prior for the parameters, where the prior variance parameter g is a function of sample size or is estimated from the data. We found that for BMA methods implemented with Markov chain Monte Carlo, 10,000 iterations were enough. Computationally, we found two of the three best methods (BMA with g = root n, and empirical Bayes-local) to be competitive with the least absolute shrinkage and selection operator (LASSO), which is often preferred as a variable selection technique because of its computational efficiency. BMA performed better than Bayesian model selection (in which just one model is selected).
引用
收藏
页数:8
相关论文
共 80 条
[11]   Model uncertainty [J].
Clyde, M ;
George, EI .
STATISTICAL SCIENCE, 2004, 19 (01) :81-94
[12]  
Clyde M., 2021, BAS: Bayesian variable selection and model averaging using Bayesian adaptive sampling. R package version 1.6.0
[13]   Empirical Bayes vs. fully Bayes variable selection [J].
Cui, Wen ;
George, Edward I. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2008, 138 (04) :888-900
[14]   Variable Selection in Cross-Section Regressions: Comparisons and Extensions [J].
Deckers, Thomas ;
Hanck, Christoph .
OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 2014, 76 (06) :841-873
[15]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[16]   DEFAULT PRIORS AND PREDICTIVE PERFORMANCE IN BAYESIAN MODEL AVERAGING, WITH APPLICATION TO GROWTH DETERMINANTS [J].
Eicher, Theo S. ;
Papageorgiou, Chris ;
Raftery, Adrian E. .
JOURNAL OF APPLIED ECONOMETRICS, 2011, 26 (01) :30-55
[17]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[18]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[19]   Benchmark priors for Bayesian model averaging [J].
Fernández, C ;
Ley, E ;
Steel, MFJ .
JOURNAL OF ECONOMETRICS, 2001, 100 (02) :381-427
[20]  
Filzmoser P., **DATA OBJECT**