A Simple Information Criterion for Variable Selection in High-Dimensional Regression

被引:0
|
作者
Pluntz, Matthieu [1 ]
Dalmasso, Cyril [2 ]
Tubert-Bitter, Pascale [1 ]
Ahmed, Ismail [1 ]
机构
[1] Univ Paris Sud, Univ Paris Saclay, High Dimens Biostat Drug Safety & Genom, UVSQ,Inserm,CESP, Villejuif, France
[2] Univ Evry Val Essonne, Lab Math & Modelisat Evry LaMME, Evry, France
基金
中国国家自然科学基金;
关键词
FWER control; high-dimensional regression; information criterion; LASSO; pharmacovigilance; variable selection; MODEL SELECTION; REGULARIZATION; LIKELIHOOD; RISK;
D O I
10.1002/sim.10275
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High-dimensional regression problems, for example with genomic or drug exposure data, typically involve automated selection of a sparse set of regressors. Penalized regression methods like the LASSO can deliver a family of candidate sparse models. To select one, there are criteria balancing log-likelihood and model size, the most common being AIC and BIC. These two methods do not take into account the implicit multiple testing performed when selecting variables in a high-dimensional regression, which makes them too liberal. We propose the extended AIC (EAIC), a new information criterion for sparse model selection in high-dimensional regressions. It allows for asymptotic FWER control when the candidate regressors are independent. It is based on a simple formula involving model log-likelihood, model size, the total number of candidate regressors, and the FWER target. In a simulation study over a wide range of linear and logistic regression settings, we assessed the variable selection performance of the EAIC and of other information criteria (including some that also use the number of candidate regressors: mBIC, mAIC, and EBIC) in conjunction with the LASSO. Our method controls the FWER in nearly all settings, in contrast to the AIC and BIC, which produce many false positives. We also illustrate it for the automated signal detection of adverse drug reactions on the French pharmacovigilance spontaneous reporting database.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] The Loss Rank Criterion for Variable Selection in Linear Regression Analysis
    Minh-Ngoc Tran
    SCANDINAVIAN JOURNAL OF STATISTICS, 2011, 38 (03) : 466 - 479
  • [42] High-dimensional variable selection via low-dimensional adaptive learning
    Staerk, Christian
    Kateri, Maria
    Ntzoufras, Ioannis
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 830 - 879
  • [43] Information criteria for structured parameter selection in high-dimensional tree and graph models
    Jansen, Maarten
    DIGITAL SIGNAL PROCESSING, 2024, 148
  • [44] A systematic review on model selection in high-dimensional regression
    Eun Ryung Lee
    Jinwoo Cho
    Kyusang Yu
    Journal of the Korean Statistical Society, 2019, 48 : 1 - 12
  • [45] Preconditioning for feature selection and regression in high-dimensional problems'
    Paul, Debashis
    Bair, Eric
    Hastie, Trevor
    Tibshirani, Robert
    ANNALS OF STATISTICS, 2008, 36 (04) : 1595 - 1618
  • [46] High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources
    Yu, Tingting
    Ye, Shangyuan
    Wang, Rui
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (03): : 900 - 923
  • [47] UPS DELIVERS OPTIMAL PHASE DIAGRAM IN HIGH-DIMENSIONAL VARIABLE SELECTION
    Ji, Pengsheng
    Jin, Jiashun
    ANNALS OF STATISTICS, 2012, 40 (01) : 73 - 103
  • [48] Variable selection in high-dimensional regression: a nonparametric procedure for business failure prediction
    Amendola, Alessandra
    Giordano, Francesco
    Parrella, Maria Lucia
    Restaino, Marialuisa
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2017, 33 (04) : 355 - 368
  • [49] Variable selection and identification of high-dimensional nonparametric nonlinear systems by directional regression
    B. Sun
    Q. Y. Cai
    Z. K. Peng
    C. M. Cheng
    F. Wang
    H. Z. Zhang
    Nonlinear Dynamics, 2023, 111 : 12101 - 12112
  • [50] High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method
    Dai, Dengluan
    Tang, Anmin
    Ye, Jinli
    MATHEMATICS, 2023, 11 (10)