Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review

被引:1
作者
Moglia, Victoria [1 ]
Johnson, Owen [1 ]
Cook, Gordon [2 ,3 ]
de Kamps, Marc [1 ]
Smith, Lesley [2 ]
机构
[1] Univ Leeds, Sch Comp, Woodhouse Lane, Leeds LS2 9JT, England
[2] Univ Leeds, Leeds Inst Clin Trials Res, Clarendon Way, Leeds LS2 9NL, England
[3] NIHR Leeds Biomed Res Ctr, Chapeltown Rd, Leeds LS7 4SA, England
基金
英国科研创新办公室;
关键词
Machine learning; Health data; Longitudinal data; Cancer; Time-series; Temporal; Artificial intelligence; DEEP LEARNING ALGORITHM; COLORECTAL-CANCER; PANCREATIC-CANCER; RISK PREDICTION; EARLY-DIAGNOSIS; TIME; MODELS;
D O I
10.1186/s12874-025-02473-w
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
BackgroundEarly detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.MethodsThe review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts "artificial intelligence", "prediction", "health records", "longitudinal", and "cancer". Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.ResultsOf 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).ConclusionThis review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients' trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.
引用
收藏
页数:17
相关论文
共 69 条
  • [1] Amirkhan R, 2018, 2017 IEEE S SER COMP, P1
  • [2] Andjelkovic J, 2022, Informatics in Medicine Unlocked, V30
  • [3] [Anonymous], 2017, Guide to Cancer Early Diagnosis
  • [4] Evaluation of machine learning strategies for imaging confirmed prostate cancer recurrence prediction on electronic health records
    Beinecke, Jacqueline Michelle
    Anders, Patrick
    Schurrat, Tino
    Heider, Dominik
    Luster, Markus
    Librizzi, Damiano
    Hauschild, Anne-Christin
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 143
  • [5] Bishop CM, 2024, Deep Learning: Foundations and Concepts, P357, DOI [10.1007/978-3-031-45468-412, DOI 10.1007/978-3-031-45468-412]
  • [6] Bishop M. C., 2006, PATTERN RECOGN, DOI DOI 10.1007/978-0-387-45528-0
  • [7] Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
    Bray, Freddie
    Laversanne, Mathieu
    Sung, Hyuna
    Ferlay, Jacques
    Siegel, Rebecca L.
    Soerjomataram, Isabelle
    Jemal, Ahmedin
    [J]. CA-A CANCER JOURNAL FOR CLINICIANS, 2024, 74 (03) : 229 - 263
  • [8] Prediction models using artificial intelligence and longitudinal data from electronic health records: a systematic methodological review
    Carrasco-Ribelles, Lucia A.
    Llanes-Jurado, Jose
    Gallego-Moll, Carlos
    Cabrera-Bean, Margarita
    Monteagudo-Zaragoza, Monica
    Violan, Concepcion
    Zabaleta-del-Olmo, Edurne
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (12) : 2072 - 2082
  • [9] Machine and deep learning for longitudinal biomedical data: a review of methods and applications
    Cascarano, Anna
    Mur-Petit, Jordi
    Hernandez-Gonzalez, Jeronimo
    Camacho, Marina
    Eadie, Nina de Toro
    Gkontra, Polyxeni
    Chadeau-Hyam, Marc
    Vitria, Jordi
    Lekadir, Karim
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL 2) : 1711 - 1771
  • [10] Recurrent Neural Networks for Multivariate Time Series with Missing Values
    Che, Zhengping
    Purushotham, Sanjay
    Cho, Kyunghyun
    Sontag, David
    Liu, Yan
    [J]. SCIENTIFIC REPORTS, 2018, 8