High dimensional variable selection via tilting

被引:56
|
作者
Cho, Haeran [1 ]
Fryzlewicz, Piotr [1 ]
机构
[1] Univ London London Sch Econ & Polit Sci, Dept Stat, London WC2A 2AE, England
关键词
Adaptivity; Correlation; Hard thresholding; High dimensionality; Linear regression; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; MODEL SELECTION; REGRESSION; LASSO; CLASSIFICATION; DISCOVERY;
D O I
10.1111/j.1467-9868.2011.01023.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
. The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly spurious) high correlations between the variables, which result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response which takes into account high correlations between the variables in a data-driven way. The proposed tilting procedure provides an adaptive choice between the use of marginal correlation and tilted correlation for each variable, where the choice is made depending on the values of the hard thresholded sample correlation of the design matrix. We study the conditions under which this measure can successfully discriminate between the relevant and the irrelevant variables and thus be used as a tool for variable selection. Finally, an iterative variable screening algorithm is constructed to exploit the theoretical properties of tilted correlation, and its good practical performance is demonstrated in a comparative simulation study.
引用
收藏
页码:593 / 622
页数:30
相关论文
共 50 条
  • [41] Group variable selection via SCAD-L2
    Zeng, Lingmin
    Xie, Jun
    STATISTICS, 2014, 48 (01) : 49 - 66
  • [42] High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking
    Wang, Fan
    Mukherjee, Sach
    Richardson, Sylvia
    Hill, Steven M.
    STATISTICS AND COMPUTING, 2020, 30 (03) : 697 - 719
  • [43] Combining a relaxed EM algorithm with Occam's razor for Bayesian variable selection in high-dimensional regression
    Latouche, Pierre
    Mattei, Pierre-Alexandre
    Bouveyron, Charles
    Chiquet, Julien
    JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 146 : 177 - 190
  • [44] Consistent tuning parameter selection in high dimensional sparse linear regression
    Wang, Tao
    Zhu, Lixing
    JOURNAL OF MULTIVARIATE ANALYSIS, 2011, 102 (07) : 1141 - 1151
  • [45] High-dimensional linear model selection motivated by multiple testing
    Furmanczyk, Konrad
    Rejchel, Wojciech
    STATISTICS, 2020, 54 (01) : 152 - 166
  • [46] Variable Selection via SCAD-Penalized Quantile Regression for High-Dimensional Count Data
    Khan, Dost Muhammad
    Yaqoob, Anum
    Iqbal, Nadeem
    Wahid, Abdul
    Khalil, Umair
    Khan, Mukhtaj
    Abd Rahman, Mohd Amiruddin
    Mustafa, Mohd Shafie
    Khan, Zardad
    IEEE ACCESS, 2019, 7 : 153205 - 153216
  • [47] Nonnegative estimation and variable selection via adaptive elastic-net for high-dimensional data
    Li, Ning
    Yang, Hu
    Yang, Jing
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (12) : 4263 - 4279
  • [48] Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality
    Xue, Lan
    Qu, Annie
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 1973 - 1998
  • [49] Structural identification and variable selection in high-dimensional varying-coefficient models
    Chen, Yuping
    Bai, Yang
    Fung, Wingkam
    JOURNAL OF NONPARAMETRIC STATISTICS, 2017, 29 (02) : 258 - 279
  • [50] Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors
    Nikooienejad, Amir
    Wang, Wenyi
    Johnson, Valen E.
    BIOINFORMATICS, 2016, 32 (09) : 1338 - 1345