Trade-off between predictive performance and FDR control for high-dimensional Gaussian model selection

被引:0
作者
Lacroix, Perrine [1 ,2 ,3 ]
Martin, Marie-Laure [2 ,3 ,4 ]
机构
[1] Univ Paris Saclay, Lab Math Orsay, CNRS, Orsay, France
[2] Univ Paris Saclay, Univ Evry, Inst Plant Sci Paris Saclay IPS2, CNRS, F-91190 Gif Sur Yvette, France
[3] Univ Paris Cite, Inst Plant Sci Paris Saclay IPS2, F-91190 Gif Sur Yvette, France
[4] Univ Paris Saclay, AgroParisTech, INRAE, UMR MIA Paris Saclay, F-91120 Palaiseau, France
来源
ELECTRONIC JOURNAL OF STATISTICS | 2024年 / 18卷 / 02期
关键词
and phrases; Ordered variable selection; prediction; FDR; high-dimension; Gaussian regression; hyperparameter calibration; FALSE DISCOVERY RATE; REGRESSION; STABILITY; INFERENCE; SPARSITY;
D O I
10.1214/24-EJS2260
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In the context of high-dimensional Gaussian linear regression for ordered variables, we study the variable selection procedure via the minimization of the penalized least-squares criterion. We focus on model selection where the penalty function depends on an unknown multiplicative constant commonly calibrated for prediction. We propose a new proper calibration of this hyperparameter to simultaneously control predictive risk and false discovery rate. We obtain non-asymptotic bounds on the False Discovery Rate with respect to the hyperparameter and we provide an algorithm to calibrate it. This algorithm is based on quantities that can typically be observed in real data applications. The algorithm is validated in an extensive simulation study and is compared with several existing variable selection procedures. Finally, we study an extension of our approach to the case in which an ordering of the variables is not available.
引用
收藏
页码:2886 / 2930
页数:45
相关论文
共 54 条
  • [1] Adapting to unknown sparsity by controlling the false discovery rate
    Abramovich, Felix
    Benjamini, Yoav
    Donoho, David L.
    Johnstone, Iain M.
    [J]. ANNALS OF STATISTICS, 2006, 34 (02) : 584 - 653
  • [2] Akaike H., 1973, 2 INT S INF THEOR AK, P267, DOI [10.1007/978-1-4612-1694-0, 10.1007/978-1-4612-0919-5_38]
  • [3] RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION
    ALLEN, DM
    [J]. TECHNOMETRICS, 1974, 16 (01) : 125 - 127
  • [4] A survey of cross-validation procedures for model selection
    Arlot, Sylvain
    Celisse, Alain
    [J]. STATISTICS SURVEYS, 2010, 4 : 40 - 79
  • [5] Bach F.R., 2008, Proceedings of the 25th international conference on Machine learning, P33, DOI DOI 10.1145/1390156
  • [6] GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE
    Baraud, Yannick
    Giraud, Christophe
    Huet, Sylvie
    [J]. ANNALS OF STATISTICS, 2009, 37 (02) : 630 - 672
  • [7] CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS
    Barber, Rina Foygel
    Candes, Emmanuel J.
    [J]. ANNALS OF STATISTICS, 2015, 43 (05) : 2055 - 2085
  • [8] Slope heuristics: overview and implementation
    Baudry, Jean-Patrick
    Maugis, Cathy
    Michel, Bertrand
    [J]. STATISTICS AND COMPUTING, 2012, 22 (02) : 455 - 470
  • [9] Benjamini Y, 2001, ANN STAT, V29, P1165
  • [10] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300