The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review

被引：0

作者：

Alhumaidi, Norah Hamad ^{[1
]}

Dermawan, Doni ^{[2
]}

Kamaruzaman, Hanin Farhana ^{[3
,4
]}

Alotaiq, Nasser ^{[5
]}

机构：

[1] Qassim Univ, Coll Med, Buraydah, Saudi Arabia

[2] Warsaw Univ Technol, Fac Chem, Appl Biotechnol, Warsaw, Poland

[3] Minist Hlth Malaysia, Med Dev Div, Malaysian Hlth Technol Assessment Sect, Wilayah Persekutuan, Putrajaya, Malaysia

[4] Univ Glasgow, Sch Hlth & Wellbeing, Hlth Econ & Hlth Technol Assessment, Glasgow City, Scotland

[5] Imam Mohammad ibn Saud Islamic Univ, Hlth Sci Res Ctr, Othman Bin Affan Rd,Al Nada, Riyadh 13317, Saudi Arabia

来源：

JMIR MEDICAL INFORMATICS | 2025年 / 13卷

关键词：

machine learning; big data; real-world data; disease prediction; health care management; real-world evidence; artificial intelligence; AI; LOGISTIC-REGRESSION; SURVIVAL ANALYSIS; ALGORITHM; MODEL; IDENTIFICATION; HEALTH; CARE;

D O I：

10.2196/68898

中图分类号：

R-058 [];

学科分类号：

摘要：

Background: Machine learning (ML) and big data analytics are rapidly transforming health care, particularly disease prediction, management, and personalized care. With the increasing availability of real-world data (RWD) from diverse sources, such as electronic health records (EHRs), patient registries, and wearable devices, ML techniques present substantial potential to enhance clinical outcomes. Despite this promise, challenges such as data quality, model transparency, generalizability, and integration into clinical practice persist. Objective: This systematic review aims to examine the use of ML for analyzing RWD in disease prediction and management, identifying the most commonly used ML methods, prevalent disease types, study designs, and the sources of real-world evidence (RWE). It also explores the strengths and limitations of current practices, offering insights for future improvements. Methods: A comprehensive search was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to identify studies using ML techniques for analyzing RWD in disease prediction and management. The search focused on extracting data regarding the ML algorithms applied; disease categories studied; types of study designs (eg, clinical trials and cohort studies); and the sources of RWE, including EHRs, patient registries, and wearable devices. Studies published between 2014 and 2024 were included to ensure the analysis of the most recent advances in the field. Results: This review identified 57 studies that met the inclusion criteria, with a total sample size of >150,000 patients. The most frequently applied ML methods were random forest (n=24, 42%), logistic regression (n=21, 37%), and support vector machines (n=18, 32%). These methods were predominantly used for predictive modeling across disease areas, including cardiovascular diseases (n=19, 33%), cancer (n=9, 16%), and neurological disorders (n=6, 11%). RWE was primarily sourced from EHRs, patient registries, and wearable devices. A substantial portion of studies (n=38, 67%) focused on improving clinical decision-making, patient stratification, and treatment optimization. Among these studies, 14 (25%) focused on decision-making; 12 (21%) on health careoutcomes, such as quality of life, recovery rates, and adverse events; and 11 (19%) on survival prediction, particularly in oncology and chronic diseases. For example, random forest models for cardiovascular disease prediction demonstrated an area under the curve of 0.85 (95% CI 0.81-0.89), while support vector machine models for cancer prognosis achieved an accuracy of 83% (P=.04). Despite the promising outcomes, many (n=34, 60%) studies faced challenges related to data quality, model interpretability, and ensuring generalizability across diverse patient populations. Conclusions: This systematic review highlights the significant potential of ML and big data analytics in health care, especially for improving disease prediction and management. However, to fully realize the benefits of these technologies, future research must focus on addressing the challenges of data quality, enhancing model transparency, and ensuring the broader applicability of ML models across diverse populations and clinical settings.

引用

页数：22

共 121 条

[41] Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review [J].