Combining a relaxed EM algorithm with Occam's razor for Bayesian variable selection in high-dimensional regression

被引:8
作者
Latouche, Pierre [1 ]
Mattei, Pierre-Alexandre [2 ]
Bouveyron, Charles [2 ]
Chiquet, Julien [3 ]
机构
[1] Univ Paris 01, Lab SAMM, EA 4543, F-75231 Paris 05, France
[2] Univ Paris 05, Lab MAP5, UMR CNRS 8145, Paris, France
[3] USC INRA, UMR CNRS 8071, UEVE, Lab LaMME, Evry, France
关键词
EM algorithm; High-dimensional data; Linear regression; Occam's razor; Spike-and-slab; Variable selection; MODEL SELECTION; MAXIMUM-LIKELIHOOD; REGULARIZATION; RELEVANCE; LASSO; SPIKE;
D O I
10.1016/j.jmva.2015.09.004
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We address the problem of Bayesian variable selection for high-dimensional linear regression. We consider a generative model that uses a spike-and-slab-like prior distribution obtained by multiplying a deterministic binary vector, which traduces the sparsity of the problem, with a random Gaussian parameter vector. The originality of the work is to consider inference through relaxing the model and using a type-II log-likelihood maximization based on an EM algorithm. Model selection is performed afterwards relying on Occam's razor and on a path of models found by the EM algorithm. Numerical comparisons between our method, called spinyReg, and state-of-the-art high-dimensional variable selection algorithms (such as lasso, adaptive lasso, stability selection or spike and-slab procedures) are reported. Competitive variable selection results and predictive performances are achieved on both simulated and real benchmark data sets. An original regression data set involving the prediction of the number of visitors of the Orsay museum in Paris using bike-sharing system data is also introduced, illustrating the efficiency of the proposed approach. The R package spinyReg implementing the method proposed in this paper is available on CRAN. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:177 / 190
页数:14
相关论文
共 60 条
[1]  
Akaike H., 1992, 2 INT S INF THEOR, P610, DOI [10.1007/978-1-4612-1694-0, 10.1007/978-1-4612-0919-538, 10.1007/978-1-4612-0919-5_38, 10.1007/978-0-387-98135-2, DOI 10.1007/978-1-4612-0919-538]
[2]   PAC-Bayesian bounds for sparse regression estimation with exponential weights [J].
Alquier, Pierre ;
Lounici, Karim .
ELECTRONIC JOURNAL OF STATISTICS, 2011, 5 :127-145
[3]   Variable selection in infinite-dimensional problems [J].
Aneiros, German ;
Vieu, Philippe .
STATISTICS & PROBABILITY LETTERS, 2014, 94 :12-20
[4]  
[Anonymous], P INT C MATH SEOUL S
[5]  
[Anonymous], 2008, EM ALGORITHM EXTENSI
[6]  
[Anonymous], 2008, Neural Information Processing Systems (NIPS)
[7]  
[Anonymous], 2005, FUNCTIONAL DATA ANAL
[8]  
[Anonymous], 2019, Statistical learning with sparsity: the lasso and generalizations
[9]  
[Anonymous], 2014, Contributions in infinite-dimensional statistics and related topics
[10]  
Bach F.R., 2008, P 25 INT C MACH LEAR, P33, DOI DOI 10.1145/1390156.1390161