ROBUST INFERENCE WITH KNOCKOFFS

被引:60
作者
Barber, Rina Foygel [1 ]
Candes, Emmanuel J. [2 ,3 ]
Samworth, Richard J. [4 ]
机构
[1] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Math, Stanford, CA 94305 USA
[4] Univ Cambridge, Stat Lab, Cambridge, England
基金
英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
Knockoffs; variable selection; false discovery rate (FDR); high-dimensional regression; robustness; FALSE DISCOVERY RATE; COVARIANCE ESTIMATION; GENOTYPE IMPUTATION; ALGORITHM; MODEL;
D O I
10.1214/19-AOS1852
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the variable selection problem, which seeks to identify important variables influencing a response Y out of many candidate features X-1, ..., X-p. We wish to do so while offering finite-sample guarantees about the fraction of false positives-selected variables X-j that in fact have no effect on Y after the other features are known. When the number of features p is large (perhaps even larger than the sample size n), and we have no prior knowledge regarding the type of dependence between Y and X, the model-X knockoffs framework nonetheless allows us to select a model with a guaranteed bound on the false discovery rate, as long as the distribution of the feature vector X = (X-1, ..., X-p) is exactly known. This model selection procedure operates by constructing "knockoff copies" of each of the p features, which are then used as a control group to ensure that the model selection algorithm is not choosing too many irrelevant features. In this work, we study the practical setting where the distribution of X can only be estimated, rather than known exactly, and the knockoff copies of the X-j's are therefore constructed somewhat incorrectly. Our results, which are free of any modeling assumption whatsoever, show that the resulting model selection procedure incurs an inflation of the false discovery rate that is proportional to our errors in estimating the distribution of each feature X-j conditional on the remaining features {X-k: k not equal j}. The model-X knockoffs framework is therefore robust to errors in the underlying assumptions on the distribution of X, making it an effective method for many practical applications, such as genome-wide association studies, where the underlying distribution on the features X-1, ..., X-p is estimated accurately but not known exactly.
引用
收藏
页码:1409 / 1431
页数:23
相关论文
共 21 条
  • [1] BARBER R. F, 2019, ROBUST INFERENCE K S, DOI [10.1214/19-AOS1852SUPP., DOI 10.1214/19-AOS1852SUPP]
  • [2] BARBER RF, 2019, ANN STAT
  • [3] CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS
    Barber, Rina Foygel
    Candes, Emmanuel J.
    [J]. ANNALS OF STATISTICS, 2015, 43 (05) : 2055 - 2085
  • [4] Benjamini Y, 2001, ANN STAT, V29, P1165
  • [5] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [6] Panning for gold: "model-X' knockoffs for high dimensional controlled variable selection
    Candes, Emmanuel
    Fan, Yingying
    Janson, Lucas
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2018, 80 (03) : 551 - 577
  • [7] RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs
    Fan, Yingying
    Demirkaya, Emre
    Li, Gaorong
    Lv, Jinchi
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (529) : 362 - 379
  • [8] On the Benjamini-Hochberg method
    Ferreira, J. A.
    Zwinderman, A. H.
    [J]. ANNALS OF STATISTICS, 2006, 34 (04) : 1827 - 1849
  • [9] Sparse inverse covariance estimation with the graphical lasso
    Friedman, Jerome
    Hastie, Trevor
    Tibshirani, Robert
    [J]. BIOSTATISTICS, 2008, 9 (03) : 432 - 441
  • [10] Fast and accurate genotype imputation in genome-wide association studies through pre-phasing
    Howie, Bryan
    Fuchsberger, Christian
    Stephens, Matthew
    Marchini, Jonathan
    Abecasis, Goncalo R.
    [J]. NATURE GENETICS, 2012, 44 (08) : 955 - +