Robust and efficient semi-supervised learning for Ising model

被引:0
作者
Wu, Daiqing [1 ]
Liu, Molei [2 ,3 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON M5S 1A1, Canada
[2] Peking Univ, Peking Univ Hlth Sci Ctr, Dept Biostat, Beijing 100191, Peoples R China
[3] Peking Univ, Beijing Int Ctr Math Res, Beijing 100191, Peoples R China
关键词
EHR surrogate; intrinsic efficiency; Ising model; score function; semi-supervised learning; REGRESSION; SELECTION;
D O I
10.1093/biomtc/ujaf060
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In biomedical studies, it is often desirable to characterize the interactive mode of multiple disease outcomes beyond their marginal risk. Ising model is one of the most popular choices serving this purpose. Nevertheless, learning efficiency of Ising models can be impeded by the scarcity of accurate disease labels, which is a prominent problem in contemporary studies driven by electronic health records (EHRs). Semi-supervised learning (SSL) leverages the large unlabeled sample with auxiliary EHR features to assist the learning with labeled data only and is a potential solution to this issue. In this paper, we develop a novel SSL method for efficient inference of Ising model. Our method first models the outcomes against the auxiliary features, then uses it to project the score function of the supervised estimator onto the EHR features, and incorporates the unlabeled sample to augment the supervised estimator for variance reduction without introducing bias. For the key step of conditional modeling, we propose strategies that can effectively leverage the auxiliary EHR information while maintaining moderate model complexity. In addition, we introduce approaches including intrinsic efficient updates and ensemble, to overcome the potential misspecification of the conditional model that may cause efficiency loss. Our method is justified by asymptotic theory and shown to outperform existing SSL methods through simulation studies. We also illustrate its utility in a real example about several key phenotypes related to frequent intensive care unit (ICU) admission on MIMIC-III data set.
引用
收藏
页数:13
相关论文
共 38 条
[1]  
Addolorato G, 1998, J INTERN MED, V244, P387
[2]  
Athey S, 2020, Arxiv, DOI [arXiv:2006.09676, DOI 10.48550/ARXIV.2006.09676]
[3]   Semi-Supervised Linear Regression [J].
Azriel, David ;
Brown, Lawrence D. ;
Sklar, Michael ;
Berk, Richard ;
Buja, Andreas ;
Zhao, Linda .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (540) :2238-2251
[4]   Semisupervised inference for explained variance in high dimensional linear regression and its applications [J].
Cai, T. Tony ;
Guo, Zijian .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2020, 82 (02) :391-419
[5]  
Chakrabortty A, 2024, Arxiv, DOI arXiv:2201.10208
[6]   EFFICIENT AND ADAPTIVE LINEAR REGRESSION IN SEMI-SUPERVISED SETTINGS [J].
Chakrabortty, Abhishek ;
Cai, Tianxi .
ANNALS OF STATISTICS, 2018, 46 (04) :1541-1572
[7]   A robust imputation method for surrogate outcome data [J].
Chen, YH .
BIOMETRIKA, 2000, 87 (03) :711-716
[8]   A unified approach to regression analysis under double-sampling designs [J].
Chen, YH .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2000, 62 :449-460
[9]   A Sparse Ising Model with Covariates [J].
Cheng, Jie ;
Levina, Elizaveta ;
Wang, Pei ;
Zhu, Ji .
BIOMETRICS, 2014, 70 (04) :943-953
[10]   Alcohol-induced oxidative stress [J].
Das, Subir Kumar ;
Vasudevan, D. M. .
LIFE SCIENCES, 2007, 81 (03) :177-187