Conditional Sure Independence Screening

被引:96
作者
Barut, Emre [1 ]
Fan, Jianqing [2 ,3 ]
Verhasselt, Anneleen [4 ]
机构
[1] George Washington Univ, Dept Stat, Washington, DC 20052 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[3] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[4] Univ Hasselt, CenStat, Interuniv Inst Biostat & Stat Bioinformat, Hasselt, Belgium
基金
美国国家科学基金会;
关键词
False selection rate; Generalized linear models; Sparsity; Sure screening; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; GENERALIZED LINEAR-MODELS; VARIABLE SELECTION; NP-DIMENSIONALITY; DANTZIG SELECTOR; REGRESSION; LASSO;
D O I
10.1080/01621459.2015.1092974
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Independence screening is powerful for variable selection when the number of variables is massive. Commonly used independence screening methods are based on marginal correlations or its variants. When some prior knowledge on a certain important set of variables is available, a natural assessment on the relative importance of the other predictors is their conditional contributions to the response given the known set of variables. This results in conditional sure independence screening (CSIS). CSIS produces a rich family of alternative screening methods by different choices of the conditioning set and can help reduce the number of false positive and false negative selections when covariates are highly correlated. This article proposes and studies CSIS in generalized linear models. We give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency and the properties of CSIS when a data-driven conditioning set is used. Moreover, we provide two data-driven methods to select the thresholding parameter of conditional screening. The utility of the procedure is illustrated by simulation studies and analysis of two real datasets. Supplementary materials for this article are available online.
引用
收藏
页码:1266 / 1277
页数:12
相关论文
共 24 条
  • [1] [Anonymous], 1997, QUASILIKELIHOOD ITS
  • [2] Barut E., 2015, MAT CONDITIONAL SU S
  • [3] SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
    Bickel, Peter J.
    Ritov, Ya'acov
    Tsybakov, Alexandre B.
    [J]. ANNALS OF STATISTICS, 2009, 37 (04) : 1705 - 1732
  • [4] Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
  • [5] Least angle regression - Rejoinder
    Efron, B
    Hastie, T
    Johnstone, I
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2004, 32 (02) : 494 - 499
  • [6] Sure independence screening for ultrahigh dimensional feature space
    Fan, Jianqing
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 849 - 883
  • [7] Nonconcave Penalized Likelihood With NP-Dimensionality
    Fan, Jianqing
    Lv, Jinchi
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2011, 57 (08) : 5467 - 5484
  • [8] Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models
    Fan, Jianqing
    Feng, Yang
    Song, Rui
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) : 544 - 557
  • [9] SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY
    Fan, Jianqing
    Song, Rui
    [J]. ANNALS OF STATISTICS, 2010, 38 (06) : 3567 - 3604
  • [10] Fan JQ, 2009, J MACH LEARN RES, V10, P2013