Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data

被引:5
作者
Read, Andrew J. J. [1 ,2 ,3 ]
Zhou, Wenjing [4 ]
Saini, Sameer D. D. [1 ,2 ,3 ,5 ]
Zhu, Ji [3 ,4 ]
Waljee, Akbar K. K. [1 ,2 ,3 ,5 ]
机构
[1] Univ Michigan, Dept Internal Med, Div Gastroenterol & Hepatol, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Inst Healthcare Policy & Innovat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Michigan Integrated Ctr Hlth Analyt & Med Predict, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[5] VA HSR&D Ctr Clin Management Res, Ann Arbor, MI 48105 USA
关键词
gastrointestinal cancers; prediction model; machine learning; COLORECTAL-CANCER; RISK-FACTORS; GASTRIC-CANCER; DIAGNOSIS; PROGNOSIS; MODEL;
D O I
10.3390/cancers15051399
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Cancers of the gastrointestinal tract-including the esophagus, stomach, and intestines-are often diagnosed at an advanced stage, when curative treatments are rare. These cancers can all cause gastrointestinal bleeding, but this often occurs gradually and may be unnoticed by patients. Changes in routine laboratory parameters such as the complete blood count may be able to show these subtle changes prior to clinical presentation or the development of iron deficiency anemia. The aim of our study was to develop models for the prediction of luminal gastrointestinal tract cancers (esophageal, gastric, small bowel, colorectal, anal) using data routinely available within an electronic health record, in a retrospective cohort from an academic medical center. The cohort included 148,158 individuals, with 1025 gastrointestinal tract cancers. We found that longitudinal prediction models using the complete blood count outperformed a single timepoint logistic model for 3-year cancer prediction. Background: Luminal gastrointestinal (GI) tract cancers, including esophageal, gastric, small bowel, colorectal, and anal cancers, are often diagnosed at late stages. These tumors can cause gradual GI bleeding, which may be unrecognized but detectable by subtle laboratory changes. Our aim was to develop models to predict luminal GI tract cancers using laboratory studies and patient characteristics using logistic regression and random forest machine learning methods. Methods: The study was a single-center, retrospective cohort at an academic medical center, with enrollment between 2004-2013 and with follow-up until 2018, who had at least two complete blood counts (CBCs). The primary outcome was the diagnosis of GI tract cancer. Prediction models were developed using multivariable single timepoint logistic regression, longitudinal logistic regression, and random forest machine learning. Results: The cohort included 148,158 individuals, with 1025 GI tract cancers. For 3-year prediction of GI tract cancers, the longitudinal random forest model performed the best, with an area under the receiver operator curve (AuROC) of 0.750 (95% CI 0.729-0.771) and Brier score of 0.116, compared to the longitudinal logistic regression model, with an AuROC of 0.735 (95% CI 0.713-0.757) and Brier score of 0.205. Conclusions: Prediction models incorporating longitudinal features of the CBC outperformed the single timepoint logistic regression models at 3-years, with a trend toward improved accuracy of prediction using a random forest machine learning model compared to a longitudinal logistic regression model.
引用
收藏
页数:14
相关论文
共 50 条
[21]   Risk Prediction of Renal Failure for Chronic Disease Population Based on Electronic Health Record Big Data [J].
Yang, Yujie ;
Li, Ye ;
Chen, Runge ;
Zheng, Jing ;
Cai, Yunpeng ;
Fortino, Giancarlo .
BIG DATA RESEARCH, 2021, 25
[22]   Improving Prediction of Fall Risk Using Electronic Health Record Data With Various Types and Sources at Multiple Times [J].
Jung, Hyesil ;
Park, Hyeoun-Ae ;
Hwang, Hee .
CIN-COMPUTERS INFORMATICS NURSING, 2020, 38 (03) :157-164
[23]   Development of a Human Immunodeficiency Virus (HIV) Risk Prediction Model Using Electronic Health Record Data From an Academic Health System in the Southern United States [J].
Burns, Charles M. ;
Pung, Leland ;
Witt, Daniel ;
Gao, Michael ;
Sendak, Mark ;
Balu, Suresh ;
Krakower, Douglas ;
Marcus, Julia L. ;
Okeke, Nwora Lance ;
Clement, Meredith E. .
CLINICAL INFECTIOUS DISEASES, 2023, 76 (02) :299-306
[24]   Predicting Intensive Care Unit Readmission with Machine Learning Using Electronic Health Record Data [J].
Rojas, Juan C. ;
Carey, Kyle A. ;
Edelson, Dana P. ;
Venable, Laura R. ;
Howell, Michael D. ;
Churpek, Matthew M. .
ANNALS OF THE AMERICAN THORACIC SOCIETY, 2018, 15 (07) :846-853
[25]   Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data [J].
Ehrig, Molly ;
Bullock, Garrett S. ;
Leng, Xiaoyan Iris ;
Pajewski, Nicholas M. ;
Speiser, Jaime Lynn .
JMIR MEDICAL INFORMATICS, 2025, 13
[26]   Development and Temporal Validation of an Electronic Medical Record-Based Insomnia Prediction Model Using Data from a Statewide Health Information Exchange [J].
Holler, Emma ;
Chekani, Farid ;
Ai, Jizhou ;
Meng, Weilin ;
Khandker, Rezaul Karim ;
Ben Miled, Zina ;
Owora, Arthur ;
Dexter, Paul ;
Campbell, Noll ;
Solid, Craig ;
Boustani, Malaz .
JOURNAL OF CLINICAL MEDICINE, 2023, 12 (09)
[27]   Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data [J].
Jenna Wong ;
Mara Murray Horwitz ;
Li Zhou ;
Sengwee Toh .
Current Epidemiology Reports, 2018, 5 :331-342
[28]   Early prediction of shock in intensive care unit patients by machine learning using discrete electronic health record data [J].
Jentzer, Jacob C. ;
Patel, Shrinath ;
Gajic, Ognjen ;
Herasevich, Vitaly ;
Lopez-Jimenez, Francisco ;
Murphree, Dennis H. ;
Patel, Parag C. ;
Kashani, Kianoush B. .
JOURNAL OF CRITICAL CARE, 2025, 88
[29]   Prediction of short-term mortality in acute heart failure patients using minimal electronic health record data [J].
Ashwath Radhachandran ;
Anurag Garikipati ;
Nicole S. Zelin ;
Emily Pellegrini ;
Sina Ghandian ;
Jacob Calvert ;
Jana Hoffman ;
Qingqing Mao ;
Ritankar Das .
BioData Mining, 14
[30]   Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data [J].
Jing, Bocheng ;
Boscardin, W. John ;
Deardorff, W. James ;
Jeon, Sun Young ;
Lee, Alexandra K. ;
Donovan, Anne L. ;
Lee, Sei J. .
MEDICAL CARE, 2022, 60 (06) :470-479