High dimensional variable selection via tilting

被引:56
|
作者
Cho, Haeran [1 ]
Fryzlewicz, Piotr [1 ]
机构
[1] Univ London London Sch Econ & Polit Sci, Dept Stat, London WC2A 2AE, England
关键词
Adaptivity; Correlation; Hard thresholding; High dimensionality; Linear regression; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; MODEL SELECTION; REGRESSION; LASSO; CLASSIFICATION; DISCOVERY;
D O I
10.1111/j.1467-9868.2011.01023.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
. The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly spurious) high correlations between the variables, which result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response which takes into account high correlations between the variables in a data-driven way. The proposed tilting procedure provides an adaptive choice between the use of marginal correlation and tilted correlation for each variable, where the choice is made depending on the values of the hard thresholded sample correlation of the design matrix. We study the conditions under which this measure can successfully discriminate between the relevant and the irrelevant variables and thus be used as a tool for variable selection. Finally, an iterative variable screening algorithm is constructed to exploit the theoretical properties of tilted correlation, and its good practical performance is demonstrated in a comparative simulation study.
引用
收藏
页码:593 / 622
页数:30
相关论文
共 50 条
  • [31] Variable selection and parameter estimation via WLAD-SCAD with a diverging number of parameters
    Wang, Yanxin
    Zhu, Li
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2017, 46 (03) : 390 - 403
  • [32] High-Dimensional Variable Selection for Survival Data
    Ishwaran, Hemant
    Kogalur, Udaya B.
    Gorodeski, Eiran Z.
    Minn, Andy J.
    Lauer, Michael S.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 205 - 217
  • [33] SEQUENTIAL, BOTTOM-UP VARIABLE SELECTION FOR HIGH-DIMENSIONAL CLASSIFICATION
    Hall, Peter
    Miller, Hugh
    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2010, 52 (04) : 403 - 421
  • [34] A Metropolized Adaptive Subspace Algorithm for High-Dimensional Bayesian Variable Selection
    Staerk, Christian
    Kateri, Maria
    Ntzoufras, Ioannis
    BAYESIAN ANALYSIS, 2024, 19 (01): : 261 - 291
  • [35] Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems
    Hall, Peter
    Miller, Hugh
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (03) : 533 - 550
  • [36] LASSO-type variable selection methods for high-dimensional data
    Fu, Guanghui
    Wang, Pan
    ADVANCES IN COMPUTATIONAL MODELING AND SIMULATION, PTS 1 AND 2, 2014, 444-445 : 604 - 609
  • [37] Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data
    Zhao, Yize
    Kang, Jian
    Long, Qi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (02) : 537 - 550
  • [38] High-Dimensional Variable Selection With Reciprocal L1-Regularization
    Song, Qifan
    Liang, Faming
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (512) : 1607 - 1620
  • [39] Determining and Depicting Relationships Among Components in High-Dimensional Variable Selection
    Hall, Peter
    Miller, Hugh
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 988 - 1006
  • [40] High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective
    Roy, Saptarshi
    Tewari, Ambuj
    Zhu, Ziwei
    BERNOULLI, 2025, 31 (02) : 1206 - 1229