High dimensional variable selection via tilting

被引:56
|
作者
Cho, Haeran [1 ]
Fryzlewicz, Piotr [1 ]
机构
[1] Univ London London Sch Econ & Polit Sci, Dept Stat, London WC2A 2AE, England
关键词
Adaptivity; Correlation; Hard thresholding; High dimensionality; Linear regression; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; MODEL SELECTION; REGRESSION; LASSO; CLASSIFICATION; DISCOVERY;
D O I
10.1111/j.1467-9868.2011.01023.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
. The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly spurious) high correlations between the variables, which result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response which takes into account high correlations between the variables in a data-driven way. The proposed tilting procedure provides an adaptive choice between the use of marginal correlation and tilted correlation for each variable, where the choice is made depending on the values of the hard thresholded sample correlation of the design matrix. We study the conditions under which this measure can successfully discriminate between the relevant and the irrelevant variables and thus be used as a tool for variable selection. Finally, an iterative variable screening algorithm is constructed to exploit the theoretical properties of tilted correlation, and its good practical performance is demonstrated in a comparative simulation study.
引用
收藏
页码:593 / 622
页数:30
相关论文
共 50 条
  • [21] Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions
    Bondell, Howard D.
    Reich, Brian J.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (500) : 1610 - 1624
  • [22] An Improved Forward Regression Variable Selection Algorithm for High-Dimensional Linear Regression Models
    Xie, Yanxi
    Li, Yuewen
    Xia, Zhijie
    Yan, Ruixia
    IEEE ACCESS, 2020, 8 (08): : 129032 - 129042
  • [23] Variable selection and estimation for high-dimensional spatial autoregressive models
    Cai, Liqian
    Maiti, Tapabrata
    SCANDINAVIAN JOURNAL OF STATISTICS, 2020, 47 (02) : 587 - 607
  • [24] Variable selection for longitudinal data with high-dimensional covariates and dropouts
    Zheng, Xueying
    Fu, Bo
    Zhang, Jiajia
    Qin, Guoyou
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (04) : 712 - 725
  • [25] FACTOR MODELS AND VARIABLE SELECTION IN HIGH-DIMENSIONAL REGRESSION ANALYSIS
    Kneip, Alois
    Sarda, Pascal
    ANNALS OF STATISTICS, 2011, 39 (05) : 2410 - 2447
  • [26] A Simple Information Criterion for Variable Selection in High-Dimensional Regression
    Pluntz, Matthieu
    Dalmasso, Cyril
    Tubert-Bitter, Pascale
    Ahmed, Ismail
    STATISTICS IN MEDICINE, 2025, 44 (1-2)
  • [27] Optimized variable selection via repeated data splitting
    Capanu, Marinela
    Giurcanu, Mihai
    Begg, Colin B.
    Gonen, Mithat
    STATISTICS IN MEDICINE, 2020, 39 (16) : 2167 - 2184
  • [28] Laplace Error Penalty-based Variable Selection in High Dimension
    Wen, Canhong
    Wang, Xueqin
    Wang, Shaoli
    SCANDINAVIAN JOURNAL OF STATISTICS, 2015, 42 (03) : 685 - 700
  • [29] Variable Selection in Nonparametric Classification Via Measurement Error Model Selection Likelihoods
    Stefanski, L. A.
    Wu, Yichao
    White, Kyle
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (506) : 574 - 589
  • [30] High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach
    Garcia-Torres, Miguel
    Gomez-Vela, Francisco
    Melian-Batista, Belen
    Marcos Moreno-Vega, J.
    INFORMATION SCIENCES, 2016, 326 : 102 - 118