High dimensional variable selection via tilting

被引:56
|
作者
Cho, Haeran [1 ]
Fryzlewicz, Piotr [1 ]
机构
[1] Univ London London Sch Econ & Polit Sci, Dept Stat, London WC2A 2AE, England
关键词
Adaptivity; Correlation; Hard thresholding; High dimensionality; Linear regression; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; MODEL SELECTION; REGRESSION; LASSO; CLASSIFICATION; DISCOVERY;
D O I
10.1111/j.1467-9868.2011.01023.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
. The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly spurious) high correlations between the variables, which result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response which takes into account high correlations between the variables in a data-driven way. The proposed tilting procedure provides an adaptive choice between the use of marginal correlation and tilted correlation for each variable, where the choice is made depending on the values of the hard thresholded sample correlation of the design matrix. We study the conditions under which this measure can successfully discriminate between the relevant and the irrelevant variables and thus be used as a tool for variable selection. Finally, an iterative variable screening algorithm is constructed to exploit the theoretical properties of tilted correlation, and its good practical performance is demonstrated in a comparative simulation study.
引用
收藏
页码:593 / 622
页数:30
相关论文
共 50 条
  • [1] VARIABLE SELECTION FOR HIGH DIMENSIONAL MULTIVARIATE OUTCOMES
    Sofer, Tamar
    Dicker, Lee
    Lin, Xihong
    STATISTICA SINICA, 2014, 24 (04) : 1633 - 1654
  • [2] Variable selection via combined penalization for high-dimensional data analysis
    Wang, Xiaoming
    Park, Taesung
    Carriere, K. C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (10) : 2230 - 2243
  • [3] A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
    Fan, Jianqing
    Lv, Jinchi
    STATISTICA SINICA, 2010, 20 (01) : 101 - 148
  • [4] High-dimensional variable selection via low-dimensional adaptive learning
    Staerk, Christian
    Kateri, Maria
    Ntzoufras, Ioannis
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 830 - 879
  • [5] High-Dimensional Process Monitoring and Fault Isolation via Variable Selection
    Wang, Kaibo
    Jiang, Wei
    JOURNAL OF QUALITY TECHNOLOGY, 2009, 41 (03) : 247 - 258
  • [6] The use of random-effect models for high-dimensional variable selection problems
    Kwon, Sunghoon
    Oh, Seungyoung
    Lee, Youngjo
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 103 : 401 - 412
  • [7] Variable selection in infinite-dimensional problems
    Aneiros, German
    Vieu, Philippe
    STATISTICS & PROBABILITY LETTERS, 2014, 94 : 12 - 20
  • [8] Simultaneous dimension reduction and variable selection in modeling high dimensional data
    Lansangan, Joseph Ryan G.
    Barrios, Erniel B.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 112 : 242 - 256
  • [9] Variable selection and estimation in high-dimensional models
    Horowitz, Joel L.
    CANADIAN JOURNAL OF ECONOMICS-REVUE CANADIENNE D ECONOMIQUE, 2015, 48 (02): : 389 - 407
  • [10] Sparse Variable Selection on High Dimensional Heterogeneous Data With Tree Structured Responses
    Liu, Hui
    Liu, Xiang
    Diao, Jing
    Ye, Wenting
    Liu, Xueling
    Wei, Dehui
    IEEE ACCESS, 2024, 12 : 50779 - 50791