Prediction of acute and chronic kidney diseases during the post-covid-19 pandemic with machine learning models: utilizing national electronic health records in the US

被引:0
作者
Zhang, Yue [1 ]
Ghahramani, Nasrollah [1 ,2 ]
Li, Runjia [3 ]
Chinchilli, Vernon M. [1 ]
Ba, Djibril M. [1 ]
机构
[1] Penn State Coll Med, Dept Publ Hlth Sci, 90 Hope Dr, Hershey, PA 17033 USA
[2] Penn State Coll Med, Dept Med, Hershey, PA USA
[3] Univ Pittsburgh, Sch Publ Hlth, Dept Biostat, Pittsburgh, PA USA
关键词
COVID-19; Kidney diseases; Machine learning; Real world data; Electronic health records; DIAGNOSIS; INJURY; AKI;
D O I
10.1016/j.ebiom.2025.105726
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background COVID-19 has been linked to acute kidney injury (AKI) and chronic kidney disease (CKD), but machine learning (ML) models predicting these risks post-pandemic have been absent. We aimed to use large electronic health records (EHR) and ML algorithms to predict the incidence of AKI and CKD during the post-pandemic period, assess the necessity of including COVID-19 infection history as a predictor, and develop a practical webpage application for clinical use. Methods National EHR data from TriNetX, emulating a prospective cohort of 104,565 patients from 07/01/2022 to 03/31/2024, were used. A total of 69 baseline variables were included, with demographics, comorbidities, lab test results, vital signs, medication histories, hospitalization visits, and COVID-19-related variables. Prediction windows of 1 month and 1 year were defined to assess AKI and CKD incidence. Eight machine learning models, primarily including extreme gradient boosting (XGBoost), neural network, and random forest (RF), were applied. Cross-validation and model tuning were conducted during the training process. Model performance was evaluated using six metrics, including the area under the receiver-operating-characteristic curve (AUROC). A combination of model-driven, data-driven, and clinical-driven methods was employed to identify the final models. An application with the final models was built using the R Shiny framework. Findings The final models, incorporating 9 variables-primarily including eGFR, inpatient visit number, and number of COVID-19 infections-were selected. XGBoost demonstrated the best performance for predicting the incidence of AKI in 1 month (AUROC = 0.803), AKI in 1 year (AUROC = 0.799), and CKD in 1 year (AUROC = 0.894). Random Forest (RF) was selected for predicting the incidence of CKD in 1 month (AUROC = 0.896). A comparison of AUROC with and without COVID-19 infection confirmed its importance as a critical predictor in the model. The final models were translated into a convenient tool to facilitate their use in clinical settings. Interpretation Our study demonstrates the applicability of using large national EHR data in developing highperformance machine learning models to predict AKI and CKD risks in the post-COVID-19 period. Incorporating the number of COVID-19 infections in the past year showed improved prediction performance and should be considered in future models for kidney disease prediction. A user-friendly application was created to support clinicians in risk assessment and surveillance.
引用
收藏
页数:12
相关论文
共 48 条
[1]   Prediction of Long-Term Stroke Recurrence Using Machine Learning Models [J].
Abedi, Vida ;
Avula, Venkatesh ;
Chaudhary, Durgesh ;
Shahjouei, Shima ;
Khan, Ayesha ;
Griessenauer, Christoph J. ;
Li, Jiang ;
Zand, Ramin .
JOURNAL OF CLINICAL MEDICINE, 2021, 10 (06) :1-16
[2]  
[Anonymous], Publication Guidelines
[3]   Post-COVID-19 condition symptoms among emergency department patients tested for SARS-CoV-2 infection [J].
Archambault, Patrick M. ;
Rosychuk, Rhonda J. ;
Audet, Martyne ;
Hau, Jeffrey P. ;
Graves, Lorraine ;
Decary, Simon ;
Perry, Jeffrey J. ;
Brooks, Steven C. ;
Morrison, Laurie J. ;
Daoust, Raoul ;
Yeom, David Seonguk ;
Wiemer, Hana ;
Fok, Patrick T. ;
Mcrae, Andrew D. ;
Chandra, Kavish ;
Kho, Michelle E. ;
Stacey, Dawn ;
Vissandjee, Bilkis ;
Menear, Matthew ;
Mercier, Eric ;
Vaillancourt, Samuel ;
Aziz, Samina ;
Zakaria, Dianne ;
Davis, Phil ;
Dainty, Katie N. ;
Paquette, Jean-Sebastien ;
Leeies, Murdoch ;
Goulding, Susie ;
Pelletier, Elyse Berger ;
Hohl, Corinne M. ;
Martin, Ian ;
Wormsbecker, Sean ;
Purssell, Elizabeth ;
Graham, Lee ;
Stachura, Maja ;
Scheuermeyer, Frank ;
Taylor, John ;
Brar, Baljeet ;
Ting, Daniel ;
Ohle, Rob ;
Cheng, Ivy ;
Yan, Justin ;
Clark, Gregory ;
Turner, Joel ;
Grant, Lars ;
Robert, Sebastien .
NATURE COMMUNICATIONS, 2024, 15 (01)
[4]  
Asra Taufik, 2021, 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST), P264, DOI 10.1109/ICEAST52143.2021.9426291
[5]   Machine learning to predict end stage kidney disease in chronic kidney disease [J].
Bai, Qiong ;
Su, Chunyan ;
Tang, Wen ;
Li, Yike .
SCIENTIFIC REPORTS, 2022, 12 (01)
[6]   Postacute sequelae of COVID-19 at 2 years [J].
Bowe, Benjamin ;
Xie, Yan ;
Al-Aly, Ziyad .
NATURE MEDICINE, 2023, 29 (09) :2347-+
[7]   Acute and postacute sequelae associated with SARS-CoV-2 reinfection [J].
Bowe, Benjamin ;
Xie, Yan ;
Al-Aly, Ziyad .
NATURE MEDICINE, 2022, 28 (11) :2398-+
[8]   Kidney Outcomes in Long COVID [J].
Bowe, Benjamin ;
Xie, Yan ;
Xu, Evan ;
Al-Aly, Ziyad .
JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2021, 32 (11) :2851-2862
[9]   Multiple Imputation for Missing Data via Sequential Regression Trees [J].
Burgette, Lane F. ;
Reiter, Jerome P. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) :1070-1076
[10]   Three-year outcomes of post-acute sequelae of COVID-19 [J].
Cai, Miao ;
Xie, Yan ;
Topol, Eric J. ;
Al-Aly, Ziyad .
NATURE MEDICINE, 2024, 30 (06) :1564-+