FDR control and power analysis for high-dimensional logistic regression via StabKoff

被引:2
|
作者
Yuan, Panxu [1 ]
Kong, Yinfei [2 ]
Li, Gaorong [1 ]
机构
[1] Beijing Normal Univ, Sch Stat, Beijing 100875, Peoples R China
[2] Calif State Univ Fullerton, Coll Business & Econ, Dept Informat Syst & Decis Sci, Fullerton, CA 92831 USA
基金
中国国家自然科学基金;
关键词
False discovery rate; Logistic regression; Power analysis; Stability knockoffs; Variable selection; SUBSTANCE-ABUSE TREATMENT; FALSE DISCOVERY RATE; VARIABLE SELECTION; MODELS;
D O I
10.1007/s00362-023-01501-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Identifying significant variables for the high-dimensional logistic regression model is a fundamental problem in modern statistics and machine learning. This paper introduces a stability knockoffs (StabKoff) selection procedure by merging stability selection and knockoffs to conduct controlled variable selection for logistic regression. Under some regularity conditions, we show that the proposed method achieves FDR control under the finite-sample setting, and the power also asymptotically approaches one as the sample size tends to infinity. In addition, we further develop an intersection strategy that allows better separation of knockoff statistics between significant and unimportant variables, which in some cases leads to an increase in power. The simulation studies demonstrate that the proposed method possesses satisfactory finite-sample performance compared with existing methods in terms of both FDR and power. We also apply the proposed method to a real data set on opioid use disorder treatment.
引用
收藏
页码:2719 / 2749
页数:31
相关论文
共 50 条
  • [21] Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model
    Kim, Hyunjin
    Lee, Eun Ryung
    Park, Seyoung
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [22] Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression
    Insolia, Luca
    Kenney, Ana
    Calovi, Martina
    Chiaromonte, Francesca
    STATS, 2021, 4 (03): : 665 - 681
  • [23] SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
    Yadlowsky, Steve
    Yun, Taedong
    McLean, Cory
    D'Amour, Alexander
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [24] High-dimensional linear regression via implicit regularization
    Zhao, Peng
    Yang, Yun
    He, Qiao-Chu
    BIOMETRIKA, 2022, 109 (04) : 1033 - 1046
  • [25] Post-selection Inference of High-dimensional Logistic Regression Under Case-Control Design
    Lin, Yuanyuan
    Xie, Jinhan
    Han, Ruijian
    Tang, Niansheng
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2023, 41 (02) : 624 - 635
  • [26] High-dimensional regression analysis with treatment comparisons
    Lue, Heng-Hui
    You, Bing-Ran
    COMPUTATIONAL STATISTICS, 2013, 28 (03) : 1299 - 1317
  • [27] Subgroup analysis for high-dimensional functional regression
    Zhang, Xiaochen
    Zhang, Qingzhao
    Ma, Shuangge
    Fang, Kuangnan
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 192
  • [28] High-dimensional regression analysis with treatment comparisons
    Heng-Hui Lue
    Bing-Ran You
    Computational Statistics, 2013, 28 : 1299 - 1317
  • [29] Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data
    Alharthi, Aiedh Mrisi
    Lee, Muhammad Hisyam
    Algamal, Zakariya Yahya
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2022, 18 (02) : 40 - 54
  • [30] Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data
    Bzdok, Danilo
    Eickenberg, Michael
    Grisel, Olivier
    Thirion, Bertrand
    Varoquaux, Gael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28