Trade-off between predictive performance and FDR control for high-dimensional Gaussian model selection

被引:0
作者
Lacroix, Perrine [1 ,2 ,3 ]
Martin, Marie-Laure [2 ,3 ,4 ]
机构
[1] Univ Paris Saclay, Lab Math Orsay, CNRS, Orsay, France
[2] Univ Paris Saclay, Univ Evry, Inst Plant Sci Paris Saclay IPS2, CNRS, F-91190 Gif Sur Yvette, France
[3] Univ Paris Cite, Inst Plant Sci Paris Saclay IPS2, F-91190 Gif Sur Yvette, France
[4] Univ Paris Saclay, AgroParisTech, INRAE, UMR MIA Paris Saclay, F-91120 Palaiseau, France
来源
ELECTRONIC JOURNAL OF STATISTICS | 2024年 / 18卷 / 02期
关键词
and phrases; Ordered variable selection; prediction; FDR; high-dimension; Gaussian regression; hyperparameter calibration; FALSE DISCOVERY RATE; REGRESSION; STABILITY; INFERENCE; SPARSITY;
D O I
10.1214/24-EJS2260
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In the context of high-dimensional Gaussian linear regression for ordered variables, we study the variable selection procedure via the minimization of the penalized least-squares criterion. We focus on model selection where the penalty function depends on an unknown multiplicative constant commonly calibrated for prediction. We propose a new proper calibration of this hyperparameter to simultaneously control predictive risk and false discovery rate. We obtain non-asymptotic bounds on the False Discovery Rate with respect to the hyperparameter and we provide an algorithm to calibrate it. This algorithm is based on quantities that can typically be observed in real data applications. The algorithm is validated in an extensive simulation study and is compared with several existing variable selection procedures. Finally, we study an extension of our approach to the case in which an ordering of the variables is not available.
引用
收藏
页码:2886 / 2930
页数:45
相关论文
共 54 条
  • [11] VALID POST-SELECTION INFERENCE
    Berk, Richard
    Brown, Lawrence
    Buja, Andreas
    Zhang, Kai
    Zhao, Linda
    [J]. ANNALS OF STATISTICS, 2013, 41 (02) : 802 - 837
  • [12] BICKEL P. J., 2008, Regularized estimation of large covariance matrices
  • [13] Birge L., 2001, J EUR MATH SOC, V3, P203, DOI DOI 10.1007/S100970100031
  • [14] Minimal penalties for Gaussian model selection
    Birge, Lucien
    Massart, Pascal
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 2007, 138 (1-2) : 33 - 73
  • [15] BOGDAN M, 2013, arXiv
  • [16] Bonferroni C., 1936, Publicazioni Del R Istituto Superiore di Scienze Economische e Commericiali di Firenze, V8, P3
  • [17] Breiman L., 1984, CLASSIFICATION REGRE, V40, P358, DOI [10.2307/2530946, DOI 10.1002/WIDM.8]
  • [18] Breiman Leo, 2001, MACH LEARN, V45, P5
  • [19] Sparsity oracle inequalities for the Lasso
    Bunea, Florentina
    Tsybakov, Alexandre
    Wegkamp, Marten
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2007, 1 : 169 - 194
  • [20] Aggregation for gaussian regression
    Bunea, Florentina
    Tsybakov, Alexandre B.
    Wegkamp, Marten H.
    [J]. ANNALS OF STATISTICS, 2007, 35 (04) : 1674 - 1697