Debiased lasso after sample splitting for estimation and inference in high-dimensional generalized linear models

被引:0
作者
Vazquez, Omar [1 ]
Nan, Bin [2 ]
机构
[1] Univ Penn, Dept Biostat Epidemiol & Informat, Philadelphia, PA USA
[2] Univ Calif Irvine, Dept Stat, Irvine, CA 92697 USA
来源
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE | 2025年 / 53卷 / 01期
关键词
Asymptotic normality; genetic marker; high-dimensional inference; single nucleotide polymorphism; sparse regression; CONFIDENCE-INTERVALS; VARIABLE SELECTION; SMOKING; REGULARIZATION; ASSOCIATION; REGRESSION; REGIONS; GENES; TESTS;
D O I
10.1002/cjs.11827
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider random sample splitting for estimation and inference in high-dimensional generalized linear models (GLMs), where we first apply the lasso to select a submodel using one subsample and then apply the debiased lasso to fit the selected model using the remaining subsample. We show that a sample splitting procedure based on the debiased lasso yields asymptotically normal estimates under mild conditions and that multiple splitting can address the loss of efficiency. Our simulation results indicate that using the debiased lasso instead of the standard maximum likelihood method in the estimation stage can vastly reduce the bias and variance of the resulting estimates. Furthermore, our multiple splitting debiased lasso method has better numerical performance than some existing methods for high-dimensional GLMs proposed in the recent literature. We illustrate the proposed multiple splitting method with an analysis of the smoking data of the Mid-South Tobacco Case-Control Study.
引用
收藏
页数:23
相关论文
共 38 条
[1]  
[Anonymous], 2023, R: a language and environment for statistical computing
[2]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[3]   On inference in high-dimensional regression [J].
Battey, Heather S. ;
Reid, Nancy .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (01) :149-175
[4]   Variants in two adjacent genes, EGLN2 and CYP2A6, influence smoking behavior related to disease risk via different mechanisms [J].
Bloom, A. Joseph ;
Baker, Timothy B. ;
Chen, Li-Shiun ;
Breslau, Naomi ;
Hatsukami, Dorothy ;
Bierut, Laura J. ;
Goate, Alison .
HUMAN MOLECULAR GENETICS, 2014, 23 (02) :555-561
[5]   Statistical Inference for High-Dimensional Generalized Linear Models With Binary Outcomes [J].
Cai, T. Tony ;
Guo, Zijian ;
Ma, Rong .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (542) :1319-1332
[6]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210
[7]   Estimation and Accuracy After Model Selection [J].
Efron, Bradley .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) :991-1007
[8]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[9]  
Fei Z, 2021, J MACH LEARN RES, V22
[10]   Drawing inferences for high-dimensional linear models: A selection-assisted partial regression and smoothing approach [J].
Fei, Zhe ;
Zhu, Ji ;
Banerjee, Moulinath ;
Li, Yi .
BIOMETRICS, 2019, 75 (02) :551-561