FDR control and power analysis for high-dimensional logistic regression via StabKoff

被引:2
|
作者
Yuan, Panxu [1 ]
Kong, Yinfei [2 ]
Li, Gaorong [1 ]
机构
[1] Beijing Normal Univ, Sch Stat, Beijing 100875, Peoples R China
[2] Calif State Univ Fullerton, Coll Business & Econ, Dept Informat Syst & Decis Sci, Fullerton, CA 92831 USA
基金
中国国家自然科学基金;
关键词
False discovery rate; Logistic regression; Power analysis; Stability knockoffs; Variable selection; SUBSTANCE-ABUSE TREATMENT; FALSE DISCOVERY RATE; VARIABLE SELECTION; MODELS;
D O I
10.1007/s00362-023-01501-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Identifying significant variables for the high-dimensional logistic regression model is a fundamental problem in modern statistics and machine learning. This paper introduces a stability knockoffs (StabKoff) selection procedure by merging stability selection and knockoffs to conduct controlled variable selection for logistic regression. Under some regularity conditions, we show that the proposed method achieves FDR control under the finite-sample setting, and the power also asymptotically approaches one as the sample size tends to infinity. In addition, we further develop an intersection strategy that allows better separation of knockoff statistics between significant and unimportant variables, which in some cases leads to an increase in power. The simulation studies demonstrate that the proposed method possesses satisfactory finite-sample performance compared with existing methods in terms of both FDR and power. We also apply the proposed method to a real data set on opioid use disorder treatment.
引用
收藏
页码:2719 / 2749
页数:31
相关论文
共 50 条
  • [1] High-Dimensional Classification by Sparse Logistic Regression
    Abramovich, Felix
    Grinshtein, Vadim
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (05) : 3068 - 3079
  • [2] The Impact of Regularization on High-dimensional Logistic Regression
    Salehi, Fariborz
    Abbasi, Ehsan
    Hassibi, Babak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Finite population Bayesian bootstrapping in high-dimensional classification via logistic regression
    Zarei, Shaho
    Mohammadpour, Adel
    Rezakhah, Saeid
    INTELLIGENT DATA ANALYSIS, 2018, 22 (05) : 1115 - 1126
  • [4] A MODEL OF DOUBLE DESCENT FOR HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Deng, Zeyu
    Kammoun, Abla
    Thrampoulidis, Christos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4267 - 4271
  • [5] Inference for the case probability in high-dimensional logistic regression
    Guo, Zijian
    Rakshit, Prabrisha
    Herman, Daniel S.
    Chen, Jinbo
    Journal of Machine Learning Research, 2021, 22
  • [6] Weak Signals in High-Dimensional Logistic Regression Models
    Reangsephet, Orawan
    Lisawadi, Supranee
    Ahmed, Syed Ejaz
    PROCEEDINGS OF THE THIRTEENTH INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING MANAGEMENT, VOL 1, 2020, 1001 : 121 - 133
  • [7] Robust adaptive LASSO in high-dimensional logistic regression
    Basu, Ayanendranath
    Ghosh, Abhik
    Jaenada, Maria
    Pardo, Leandro
    STATISTICAL METHODS AND APPLICATIONS, 2024,
  • [8] Using synthetic data and dimensionality reduction in high-dimensional classification via logistic regression
    Zarei, Shaho
    Mohammadpour, Adel
    COMPUTATIONAL METHODS FOR DIFFERENTIAL EQUATIONS, 2019, 7 (04): : 626 - 634
  • [9] An Efficient Testing Procedure for High-Dimensional Mediators with FDR Control
    Bai, Xueyan
    Zheng, Yinan
    Hou, Lifang
    Zheng, Cheng
    Liu, Lei
    Zhang, Haixiang
    STATISTICS IN BIOSCIENCES, 2024,
  • [10] Efficient posterior sampling for high-dimensional imbalanced logistic regression
    Sen, Deborshee
    Sachs, Matthias
    Lu, Jianfeng
    Dunson, David B.
    BIOMETRIKA, 2020, 107 (04) : 1005 - 1012