Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review

被引：1

作者：

Moglia, Victoria ^{[1
]}

Johnson, Owen ^{[1
]}

Cook, Gordon ^{[2
,3
]}

de Kamps, Marc ^{[1
]}

Smith, Lesley ^{[2
]}

机构：

[1] Univ Leeds, Sch Comp, Woodhouse Lane, Leeds LS2 9JT, England

[2] Univ Leeds, Leeds Inst Clin Trials Res, Clarendon Way, Leeds LS2 9NL, England

[3] NIHR Leeds Biomed Res Ctr, Chapeltown Rd, Leeds LS7 4SA, England

来源：

BMC MEDICAL RESEARCH METHODOLOGY | 2025年 / 25卷 / 01期

基金：

英国科研创新办公室;

关键词：

Machine learning; Health data; Longitudinal data; Cancer; Time-series; Temporal; Artificial intelligence; DEEP LEARNING ALGORITHM; COLORECTAL-CANCER; PANCREATIC-CANCER; RISK PREDICTION; EARLY-DIAGNOSIS; TIME; MODELS;

D O I：

10.1186/s12874-025-02473-w

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

BackgroundEarly detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.MethodsThe review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts "artificial intelligence", "prediction", "health records", "longitudinal", and "cancer". Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.ResultsOf 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).ConclusionThis review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients' trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.

引用

页数：17

共 69 条

[1] Amirkhan R, 2018, 2017 IEEE S SER COMP, P1
[2] Andjelkovic J, 2022, Informatics in Medicine Unlocked, V30
[3] [Anonymous], 2017, Guide to Cancer Early Diagnosis
[4] Evaluation of machine learning strategies for imaging confirmed prostate cancer recurrence prediction on electronic health records
Beinecke, Jacqueline Michelle
Anders, Patrick
Schurrat, Tino
Heider, Dominik
Luster, Markus
Librizzi, Damiano
Hauschild, Anne-Christin
[J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 143
[5] Bishop CM, 2024, Deep Learning: Foundations and Concepts, P357, DOI [10.1007/978-3-031-45468-412, DOI 10.1007/978-3-031-45468-412]
[6] Bishop M. C., 2006, PATTERN RECOGN, DOI DOI 10.1007/978-0-387-45528-0
[7] Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
Bray, Freddie
Laversanne, Mathieu
Sung, Hyuna
Ferlay, Jacques
Siegel, Rebecca L.
Soerjomataram, Isabelle
Jemal, Ahmedin
[J]. CA-A CANCER JOURNAL FOR CLINICIANS, 2024, 74 (03) : 229 - 263
[8] Prediction models using artificial intelligence and longitudinal data from electronic health records: a systematic methodological review
Carrasco-Ribelles, Lucia A.
Llanes-Jurado, Jose
Gallego-Moll, Carlos
Cabrera-Bean, Margarita
Monteagudo-Zaragoza, Monica
Violan, Concepcion
Zabaleta-del-Olmo, Edurne
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (12) : 2072 - 2082
[9] Machine and deep learning for longitudinal biomedical data: a review of methods and applications
Cascarano, Anna
Mur-Petit, Jordi
Hernandez-Gonzalez, Jeronimo
Camacho, Marina
Eadie, Nina de Toro
Gkontra, Polyxeni
Chadeau-Hyam, Marc
Vitria, Jordi
Lekadir, Karim
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL 2) : 1711 - 1771
[10] Recurrent Neural Networks for Multivariate Time Series with Missing Values
Che, Zhengping
Purushotham, Sanjay
Cho, Kyunghyun
Sontag, David
Liu, Yan
[J]. SCIENTIFIC REPORTS, 2018, 8

← 1 2 3 4 5 6 7 →