Projective inference in high-dimensional problems: Prediction and feature selection

被引:68
作者
Piironen, Juho [1 ]
Paasiniemi, Markus [1 ]
Vehtari, Aki [1 ]
机构
[1] Aalto Univ, Dept Comp Sci, Helsinki Inst Informat Technol HIIT, Espoo, Finland
基金
芬兰科学院;
关键词
Projection; prediction; feature selection; sparsity; post-selection inference; BAYESIAN VARIABLE SELECTION; GENERALIZED LINEAR-MODELS; POSTERIOR CONCENTRATION; HORSESHOE ESTIMATOR; REGRESSION; REGULARIZATION; SHRINKAGE; CHOICE; NEEDLES; STRAW;
D O I
10.1214/20-EJS1711
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper reviews predictive inference and feature selection for generalized linear models with scarce but high-dimensional data. We demonstrate that in many cases one can benefit from a decision theoretically justified two-stage approach: first, construct a possibly non-sparse model that predicts well, and then find a minimal subset of features that characterize the predictions. The model built in the first step is referred to as the reference model and the operation during the latter step as predictive projection. The key characteristic of this approach is that it finds an excellent tradeoff between sparsity and predictive accuracy, and the gain comes from utilizing all available information including prior and that coming from the left out features. We review several methods that follow this principle and provide novel methodological contributions. We present a new projection technique that unifies two existing techniques and is both accurate and fast to compute. We also propose a way of evaluating the feature selection process using fast leave-one-out cross-validation that allows for easy and intuitive model size selection. Furthermore, we prove a theorem that helps to understand the conditions under which the projective approach could be beneficial. The key ideas are illustrated via several experiments using simulated and real world data.
引用
收藏
页码:2155 / 2197
页数:43
相关论文
共 73 条
[1]  
Afrabandpey H., 2019, ARXIV191009358
[2]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[3]  
[Anonymous], 1994, Bayesian Theory, DOI DOI 10.1002/9780470316870.752,753
[4]  
Armagan Artin, 2011, Adv Neural Inf Process Syst, V24, P523
[5]   Prediction by supervised principal components [J].
Bair, E ;
Hastie, T ;
Paul, D ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :119-137
[6]   Optimal predictive model selection [J].
Barbieri, MM ;
Berger, JO .
ANNALS OF STATISTICS, 2004, 32 (03) :870-897
[7]  
Bernardo JM, 2003, BAYESIAN STATISTICS 7, P465
[8]   The Horseshoe plus Estimator of Ultra-Sparse Signals [J].
Bhadra, Anindya ;
Datta, Jyotishka ;
Polson, Nicholas G. ;
Willard, Brandon .
BAYESIAN ANALYSIS, 2017, 12 (04) :1105-1131
[9]   Dirichlet-Laplace Priors for Optimal Shrinkage [J].
Bhattacharya, Anirban ;
Pati, Debdeep ;
Pillai, Natesh S. ;
Dunson, David B. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (512) :1479-1490
[10]   BETTER SUBSET REGRESSION USING THE NONNEGATIVE GARROTE [J].
BREIMAN, L .
TECHNOMETRICS, 1995, 37 (04) :373-384