Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data

被引:3
|
作者
Read, Andrew J. J. [1 ,2 ,3 ]
Zhou, Wenjing [4 ]
Saini, Sameer D. D. [1 ,2 ,3 ,5 ]
Zhu, Ji [3 ,4 ]
Waljee, Akbar K. K. [1 ,2 ,3 ,5 ]
机构
[1] Univ Michigan, Dept Internal Med, Div Gastroenterol & Hepatol, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Inst Healthcare Policy & Innovat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Michigan Integrated Ctr Hlth Analyt & Med Predict, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[5] VA HSR&D Ctr Clin Management Res, Ann Arbor, MI 48105 USA
关键词
gastrointestinal cancers; prediction model; machine learning; COLORECTAL-CANCER; RISK-FACTORS; GASTRIC-CANCER; DIAGNOSIS; PROGNOSIS; MODEL;
D O I
10.3390/cancers15051399
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Cancers of the gastrointestinal tract-including the esophagus, stomach, and intestines-are often diagnosed at an advanced stage, when curative treatments are rare. These cancers can all cause gastrointestinal bleeding, but this often occurs gradually and may be unnoticed by patients. Changes in routine laboratory parameters such as the complete blood count may be able to show these subtle changes prior to clinical presentation or the development of iron deficiency anemia. The aim of our study was to develop models for the prediction of luminal gastrointestinal tract cancers (esophageal, gastric, small bowel, colorectal, anal) using data routinely available within an electronic health record, in a retrospective cohort from an academic medical center. The cohort included 148,158 individuals, with 1025 gastrointestinal tract cancers. We found that longitudinal prediction models using the complete blood count outperformed a single timepoint logistic model for 3-year cancer prediction. Background: Luminal gastrointestinal (GI) tract cancers, including esophageal, gastric, small bowel, colorectal, and anal cancers, are often diagnosed at late stages. These tumors can cause gradual GI bleeding, which may be unrecognized but detectable by subtle laboratory changes. Our aim was to develop models to predict luminal GI tract cancers using laboratory studies and patient characteristics using logistic regression and random forest machine learning methods. Methods: The study was a single-center, retrospective cohort at an academic medical center, with enrollment between 2004-2013 and with follow-up until 2018, who had at least two complete blood counts (CBCs). The primary outcome was the diagnosis of GI tract cancer. Prediction models were developed using multivariable single timepoint logistic regression, longitudinal logistic regression, and random forest machine learning. Results: The cohort included 148,158 individuals, with 1025 GI tract cancers. For 3-year prediction of GI tract cancers, the longitudinal random forest model performed the best, with an area under the receiver operator curve (AuROC) of 0.750 (95% CI 0.729-0.771) and Brier score of 0.116, compared to the longitudinal logistic regression model, with an AuROC of 0.735 (95% CI 0.713-0.757) and Brier score of 0.205. Conclusions: Prediction models incorporating longitudinal features of the CBC outperformed the single timepoint logistic regression models at 3-years, with a trend toward improved accuracy of prediction using a random forest machine learning model compared to a longitudinal logistic regression model.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] PREDICTION OF GASTROINTESTINAL TRACT CANCERS USING LONGITUDINAL ELECTRONIC HEALTH RECORD DATA
    Read, Andrew J.
    Zhou, Wenjing
    Saini, Sameer D.
    Zhu, Ji
    Waljee, Akbar K.
    GASTROENTEROLOGY, 2022, 162 (07) : S1045 - S1045
  • [2] Identification of urinary tract infections using electronic health record data
    Colborn, Kathryn L.
    Bronsert, Michael
    Hammermeister, Karl
    Henderson, William G.
    Singh, Abhinav B.
    Meguid, Robert A.
    AMERICAN JOURNAL OF INFECTION CONTROL, 2019, 47 (04) : 371 - 375
  • [3] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Zhao, Juan
    Feng, QiPing
    Wu, Patrick
    Lupu, Roxana A.
    Wilke, Russell A.
    Wells, Quinn S.
    Denny, Joshua C.
    Wei, Wei-Qi
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [4] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Juan Zhao
    QiPing Feng
    Patrick Wu
    Roxana A. Lupu
    Russell A. Wilke
    Quinn S. Wells
    Joshua C. Denny
    Wei-Qi Wei
    Scientific Reports, 9
  • [5] Machine Learning Prognostic Models for Gastrointestinal Bleeding Using Electronic Health Record Data
    Shung, Dennis
    Laine, Loren
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2020, 115 (08): : 1199 - 1200
  • [6] A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA
    Hasler, Byjill
    Ma, Yanyuan
    Wei, Yizheng
    Parikh, Ravi
    Chen, Jinbo
    ANNALS OF APPLIED STATISTICS, 2024, 18 (04): : 3318 - 3337
  • [7] Prediction of obstetrical and fetal complications using automated electronic health record data
    Escobar, Gabriel J.
    Soltesz, Lauren
    Schuler, Alejandro
    Niki, Hamid
    Malenica, Ivana
    Lee, Catherine
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2021, 224 (02) : 137 - 147
  • [8] The Impact of Longitudinal Data-Completeness of Electronic Health Record Data on the Prediction Performance of Clinical Risk Scores
    Jin, Yinzhu
    Weberpals, Janick G.
    Wang, Shirley V.
    Desai, Rishi J.
    Merola, David
    Lin, Kueiyu Joshua
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2023, 113 (06) : 1359 - 1367
  • [9] Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
    Li, Yikuan
    Salimi-Khorshidi, Gholamreza
    Rao, Shishir
    Canoy, Dexter
    Hassaine, Abdelaali
    Lukasiewicz, Thomas
    Rahimi, Kazem
    Mamouei, Mohammad
    EUROPEAN HEART JOURNAL - DIGITAL HEALTH, 2022, 3 (04): : 535 - 547
  • [10] The impact of longitudinal data-completeness of electronic health record (EHR) data on prediction performance of clinical risk scores
    Lin, Joshua
    Jin, Yinzhu
    Schneeweiss, Sebastian
    Merola, Dave
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 : 302 - 302