Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model

被引:0
作者
Kim, Hyunjin [1 ]
Lee, Eun Ryung [1 ]
Park, Seyoung [1 ]
机构
[1] Sungkyunkwan Univ, Dept Stat, Seoul 100190, South Korea
基金
新加坡国家研究基金会;
关键词
CONFIDENCE-INTERVALS; DRUG-SENSITIVITY; MONOAMINE-OXIDASE; SELECTION; IDENTIFICATION; METAANALYSIS; PREDICTION; SHRINKAGE; SPARSITY; REGIONS;
D O I
10.1038/s41598-023-48903-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Due to the prevalence of complex data, data heterogeneity is often observed in contemporary scientific studies and various applications. Motivated by studies on cancer cell lines, we consider the analysis of heterogeneous subpopulations with binary responses and high-dimensional covariates. In many practical scenarios, it is common to use a single regression model for the entire data set. To do this effectively, it is critical to quantify the heterogeneity of the effect of covariates across subpopulations through appropriate statistical inference. However, the high dimensionality and discrete nature of the data can lead to challenges in inference. Therefore, we propose a novel statistical inference method for a high-dimensional logistic regression model that accounts for heterogeneous subpopulations. Our primary goal is to investigate heterogeneity across subpopulations by testing the equivalence of the effect of a covariate and the significance of the overall effects of a covariate. To achieve overall sparsity of the coefficients and their fusions across subpopulations, we employ a fused group Lasso penalization method. In addition, we develop a statistical inference method that incorporates bias correction of the proposed penalized method. To address computational issues due to the nonlinear log-likelihood and the fused Lasso penalty, we propose a computationally efficient and fast algorithm by adapting the ideas of the proximal gradient method and the alternating direction method of multipliers (ADMM) to our settings. Furthermore, we develop non-asymptotic analyses for the proposed fused group Lasso and prove that the debiased test statistics admit chi-squared approximations even in the presence of high-dimensional variables. In simulations, the proposed test outperforms existing methods. The practical effectiveness of the proposed method is demonstrated by analyzing data from the Cancer Cell Line Encyclopedia (CCLE).
引用
收藏
页数:19
相关论文
共 60 条
[1]  
[Anonymous], 2023, R Foundation for Statistical Computing
[2]   Computational models for predicting drug responses in cancer research [J].
Azuaje, Francisco .
BRIEFINGS IN BIOINFORMATICS, 2017, 18 (05) :820-829
[3]   The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity [J].
Barretina, Jordi ;
Caponigro, Giordano ;
Stransky, Nicolas ;
Venkatesan, Kavitha ;
Margolin, Adam A. ;
Kim, Sungjoon ;
Wilson, Christopher J. ;
Lehar, Joseph ;
Kryukov, Gregory V. ;
Sonkin, Dmitriy ;
Reddy, Anupama ;
Liu, Manway ;
Murray, Lauren ;
Berger, Michael F. ;
Monahan, John E. ;
Morais, Paula ;
Meltzer, Jodi ;
Korejwa, Adam ;
Jane-Valbuena, Judit ;
Mapa, Felipa A. ;
Thibault, Joseph ;
Bric-Furlong, Eva ;
Raman, Pichai ;
Shipway, Aaron ;
Engels, Ingo H. ;
Cheng, Jill ;
Yu, Guoying K. ;
Yu, Jianjun ;
Aspesi, Peter, Jr. ;
de Silva, Melanie ;
Jagtap, Kalpana ;
Jones, Michael D. ;
Wang, Li ;
Hatton, Charles ;
Palescandolo, Emanuele ;
Gupta, Supriya ;
Mahan, Scott ;
Sougnez, Carrie ;
Onofrio, Robert C. ;
Liefeld, Ted ;
MacConaill, Laura ;
Winckler, Wendy ;
Reich, Michael ;
Li, Nanxin ;
Mesirov, Jill P. ;
Gabriel, Stacey B. ;
Getz, Gad ;
Ardlie, Kristin ;
Chan, Vivien ;
Myer, Vic E. .
NATURE, 2012, 483 (7391) :603-607
[4]   A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].
Beck, Amir ;
Teboulle, Marc .
SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01) :183-202
[5]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[6]   Statistical significance in high-dimensional linear models [J].
Buehlmann, Peter .
BERNOULLI, 2013, 19 (04) :1212-1242
[7]  
Cai TT, 2022, IEEE T INFORM THEORY, V68, P5975, DOI [10.1109/tit.2022.3175455, 10.1109/TIT.2022.3175455]
[8]   Statistical Inference for High-Dimensional Generalized Linear Models With Binary Outcomes [J].
Cai, T. Tony ;
Guo, Zijian ;
Ma, Rong .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (542) :1319-1332
[9]   JOINT ESTIMATION OF MULTIPLE HIGH-DIMENSIONAL PRECISION MATRICES [J].
Cai, T. Tony ;
Li, Hongzhe ;
Liu, Weidong ;
Xie, Jichun .
STATISTICA SINICA, 2016, 26 (02) :445-464