A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA

被引:0
|
作者
Hasler, Byjill [1 ]
Ma, Yanyuan [2 ]
Wei, Yizheng [3 ]
Parikh, Ravi [4 ,5 ]
Chen, Jinbo [6 ]
机构
[1] Fox Chase Canc Ctr, Philadelphia, PA 19111 USA
[2] Penn State Univ, Dept Stat, University Pk, PA USA
[3] Univ South Carolina, Dept Stat, Columbia, SC USA
[4] Univ Penn, Dept Med Eth & Hlth Policy, Philadelphia, PA USA
[5] Univ Penn, Dept Med, Philadelphia, PA USA
[6] Univ Penn, Dept Biostat Epidemiol & Informat, Philadelphia, PA USA
来源
ANNALS OF APPLIED STATISTICS | 2024年 / 18卷 / 04期
基金
美国国家卫生研究院;
关键词
Area under the ROC curve (AUC); integrated EHR data; semiparametric estimation; two-phase design; FITTING REGRESSION-MODELS; 2-STAGE CASE-CONTROL; LOGISTIC-REGRESSION; INCREMENTAL VALUE; ROC CURVE; 2-PHASE; BIOMARKERS; PARAMETERS; INFERENCE; DESIGN;
D O I
10.1214/24-AOAS1938
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
When using electronic health records (EHRs) for clinical and translational research, additional data is often available from external sources to enrich the information extracted from EHRs. For example, academic biobanks have more granular data available, and patient reported data is often collected through small-scale surveys. It is common that the external data is available only for a small subset of patients who have EHR information. We propose efficient and robust methods for building and evaluating models for predicting the risk of binary outcomes using such integrated EHR data. Our method is built upon an idea derived from the two-phase design literature that modeling the availability of a patient's external data as a function of an EHR-based preliminary predictive score leads to effective utilization of the EHR data. Through both theoretical and simulation studies, we show that our method has high efficiency for estimating log-odds ratio parameters, the area under the ROC curve, as well as other measures for quantifying predictive accuracy. We apply our method to develop a model for predicting the short-term mortality risk of oncology patients, where the data was extracted from the University of Pennsylvania hospital system EHR and combined with surveybased patient reported outcome data.
引用
收藏
页码:3318 / 3337
页数:20
相关论文
共 50 条
  • [1] Improved Cardiovascular Risk Prediction Using Nonparametric Regression and Electronic Health Record Data
    Kennedy, Edward H.
    Wiitala, Wyndy L.
    Hayward, Rodney A.
    Sussman, Jeremy B.
    MEDICAL CARE, 2013, 51 (03) : 251 - 258
  • [2] Prediction of Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
    Ward, Andrew
    Sarraju, Ashish
    Chung, Sukyung
    Palaniappan, Latha
    Scheinker, David
    Rodriguez, Fatima
    CIRCULATION, 2019, 140
  • [3] Prediction of Recurrent Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
    Sarraju, Ashish
    Ward, Andrew
    Chung, Sukyung
    Li, Jiang
    Scheinker, David
    Rodriguez, Fatima
    CIRCULATION, 2020, 142
  • [4] Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data
    Mamidi, Tarun Karthik Kumar
    Tran-Nguyen, Thi K.
    Melvin, Ryan L.
    Worthey, Elizabeth A.
    FRONTIERS IN BIG DATA, 2021, 4
  • [5] Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
    Read, Andrew J. J.
    Zhou, Wenjing
    Saini, Sameer D. D.
    Zhu, Ji
    Waljee, Akbar K. K.
    CANCERS, 2023, 15 (05)
  • [6] PREDICTION OF GASTROINTESTINAL TRACT CANCERS USING LONGITUDINAL ELECTRONIC HEALTH RECORD DATA
    Read, Andrew J.
    Zhou, Wenjing
    Saini, Sameer D.
    Zhu, Ji
    Waljee, Akbar K.
    GASTROENTEROLOGY, 2022, 162 (07) : S1045 - S1045
  • [7] Prediction of obstetrical and fetal complications using automated electronic health record data
    Escobar, Gabriel J.
    Soltesz, Lauren
    Schuler, Alejandro
    Niki, Hamid
    Malenica, Ivana
    Lee, Catherine
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2021, 224 (02) : 137 - 147
  • [8] Improving Prediction of Fall Risk Using Electronic Health Record Data With Various Types and Sources at Multiple Times
    Jung, Hyesil
    Park, Hyeoun-Ae
    Hwang, Hee
    CIN-COMPUTERS INFORMATICS NURSING, 2020, 38 (03) : 157 - 164
  • [9] External Validation of Postpartum Hemorrhage Prediction Models Using Electronic Health Record Data
    Meyer, Sean R.
    Carver, Alissa
    Joo, Hyeon
    Venkatesh, Kartik K.
    Jelovsek, J. Eric
    Klumpner, Thomas T.
    Singh, Karandeep
    AMERICAN JOURNAL OF PERINATOLOGY, 2024, 41 (05) : 598 - 605
  • [10] Preoperative Prediction of Postoperative Infections Using Machine Learning and Electronic Health Record Data
    Zhuang, Yaxu
    Dyas, Adam
    Meguid, Robert A.
    Henderson, William G.
    Bronsert, Michael
    Madsen, Helen
    Colborn, Kathryn L.
    ANNALS OF SURGERY, 2024, 279 (04) : 720 - 726