FDR control and power analysis for high-dimensional logistic regression via StabKoff

被引:2
作者
Yuan, Panxu [1 ]
Kong, Yinfei [2 ]
Li, Gaorong [1 ]
机构
[1] Beijing Normal Univ, Sch Stat, Beijing 100875, Peoples R China
[2] Calif State Univ Fullerton, Coll Business & Econ, Dept Informat Syst & Decis Sci, Fullerton, CA 92831 USA
基金
中国国家自然科学基金;
关键词
False discovery rate; Logistic regression; Power analysis; Stability knockoffs; Variable selection; SUBSTANCE-ABUSE TREATMENT; FALSE DISCOVERY RATE; VARIABLE SELECTION; MODELS;
D O I
10.1007/s00362-023-01501-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Identifying significant variables for the high-dimensional logistic regression model is a fundamental problem in modern statistics and machine learning. This paper introduces a stability knockoffs (StabKoff) selection procedure by merging stability selection and knockoffs to conduct controlled variable selection for logistic regression. Under some regularity conditions, we show that the proposed method achieves FDR control under the finite-sample setting, and the power also asymptotically approaches one as the sample size tends to infinity. In addition, we further develop an intersection strategy that allows better separation of knockoff statistics between significant and unimportant variables, which in some cases leads to an increase in power. The simulation studies demonstrate that the proposed method possesses satisfactory finite-sample performance compared with existing methods in terms of both FDR and power. We also apply the proposed method to a real data set on opioid use disorder treatment.
引用
收藏
页码:2719 / 2749
页数:31
相关论文
共 50 条
[41]   High-Dimensional Expected Shortfall Regression [J].
Zhang, Shushu ;
He, Xuming ;
Tan, Kean Ming ;
Zhou, Wen-Xin .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2025,
[42]   High-Dimensional Structured Quantile Regression [J].
Sivakumar, Vidyashankar ;
Banerjee, Arindam .
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[43]   On constrained and regularized high-dimensional regression [J].
Shen, Xiaotong ;
Pan, Wei ;
Zhu, Yunzhang ;
Zhou, Hui .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2013, 65 (05) :807-832
[44]   Predictor ranking and false discovery proportion control in high-dimensional regression [J].
Jeng, X. Jessie ;
Chen, Xiongzhi .
JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 171 :163-175
[45]   Bayes Optimal Learning in High-Dimensional Linear Regression With Network Side Information [J].
Nandy, Sagnik ;
Sen, Subhabrata .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (01) :565-591
[46]   Variable Selection in High-Dimensional Logistic Regression Models Using a Whitening Approach [J].
Zhu, Wencan ;
Levy-Leduc, Celine ;
Ternes, Nils .
IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2025, 22 (02) :800-807
[47]   Using principal components for estimating logistic regression with high-dimensional multicollinear data [J].
Aguilera, AM ;
Escabias, M ;
Valderrama, MJ .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (08) :1905-1924
[48]   Trade-off between predictive performance and FDR control for high-dimensional Gaussian model selection [J].
Lacroix, Perrine ;
Martin, Marie-Laure .
ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (02) :2886-2930
[49]   Uncertainty Quantification for Modern High-Dimensional Regression via Scalable Bayesian Methods [J].
Rajaratnam, Bala ;
Sparks, Doug ;
Khare, Kshitij ;
Zhang, Liyuan .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2019, 28 (01) :174-184
[50]   HIGH-DIMENSIONAL LEAST SQUARE MATRIX REGRESSION VIA ELASTIC NET PENALTY [J].
Chen, Bingzhen ;
Kong, Lingchen .
PACIFIC JOURNAL OF OPTIMIZATION, 2017, 13 (02) :185-196