The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review

被引:0
作者
Alhumaidi, Norah Hamad [1 ]
Dermawan, Doni [2 ]
Kamaruzaman, Hanin Farhana [3 ,4 ]
Alotaiq, Nasser [5 ]
机构
[1] Qassim Univ, Coll Med, Buraydah, Saudi Arabia
[2] Warsaw Univ Technol, Fac Chem, Appl Biotechnol, Warsaw, Poland
[3] Minist Hlth Malaysia, Med Dev Div, Malaysian Hlth Technol Assessment Sect, Wilayah Persekutuan, Putrajaya, Malaysia
[4] Univ Glasgow, Sch Hlth & Wellbeing, Hlth Econ & Hlth Technol Assessment, Glasgow City, Scotland
[5] Imam Mohammad ibn Saud Islamic Univ, Hlth Sci Res Ctr, Othman Bin Affan Rd,Al Nada, Riyadh 13317, Saudi Arabia
关键词
machine learning; big data; real-world data; disease prediction; health care management; real-world evidence; artificial intelligence; AI; LOGISTIC-REGRESSION; SURVIVAL ANALYSIS; ALGORITHM; MODEL; IDENTIFICATION; HEALTH; CARE;
D O I
10.2196/68898
中图分类号
R-058 [];
学科分类号
摘要
Background: Machine learning (ML) and big data analytics are rapidly transforming health care, particularly disease prediction, management, and personalized care. With the increasing availability of real-world data (RWD) from diverse sources, such as electronic health records (EHRs), patient registries, and wearable devices, ML techniques present substantial potential to enhance clinical outcomes. Despite this promise, challenges such as data quality, model transparency, generalizability, and integration into clinical practice persist. Objective: This systematic review aims to examine the use of ML for analyzing RWD in disease prediction and management, identifying the most commonly used ML methods, prevalent disease types, study designs, and the sources of real-world evidence (RWE). It also explores the strengths and limitations of current practices, offering insights for future improvements. Methods: A comprehensive search was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to identify studies using ML techniques for analyzing RWD in disease prediction and management. The search focused on extracting data regarding the ML algorithms applied; disease categories studied; types of study designs (eg, clinical trials and cohort studies); and the sources of RWE, including EHRs, patient registries, and wearable devices. Studies published between 2014 and 2024 were included to ensure the analysis of the most recent advances in the field. Results: This review identified 57 studies that met the inclusion criteria, with a total sample size of >150,000 patients. The most frequently applied ML methods were random forest (n=24, 42%), logistic regression (n=21, 37%), and support vector machines (n=18, 32%). These methods were predominantly used for predictive modeling across disease areas, including cardiovascular diseases (n=19, 33%), cancer (n=9, 16%), and neurological disorders (n=6, 11%). RWE was primarily sourced from EHRs, patient registries, and wearable devices. A substantial portion of studies (n=38, 67%) focused on improving clinical decision-making, patient stratification, and treatment optimization. Among these studies, 14 (25%) focused on decision-making; 12 (21%) on health careoutcomes, such as quality of life, recovery rates, and adverse events; and 11 (19%) on survival prediction, particularly in oncology and chronic diseases. For example, random forest models for cardiovascular disease prediction demonstrated an area under the curve of 0.85 (95% CI 0.81-0.89), while support vector machine models for cancer prognosis achieved an accuracy of 83% (P=.04). Despite the promising outcomes, many (n=34, 60%) studies faced challenges related to data quality, model interpretability, and ensuring generalizability across diverse patient populations. Conclusions: This systematic review highlights the significant potential of ML and big data analytics in health care, especially for improving disease prediction and management. However, to fully realize the benefits of these technologies, future research must focus on addressing the challenges of data quality, enhancing model transparency, and ensuring the broader applicability of ML models across diverse populations and clinical settings.
引用
收藏
页数:22
相关论文
共 121 条
[41]   Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review [J].
Hossain, Elias ;
Rana, Rajib ;
Higgins, Niall ;
Soar, Jeffrey ;
Barua, Prabal Datta ;
Pisani, Anthony R. ;
Turner, Kathryn .
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155
[42]   Twenty-eight-day in-hospital mortality prediction for elderly patients with ischemic stroke in the intensive care unit: Interpretable machine learning models [J].
Huang, Jian ;
Jin, Wanlin ;
Duan, Xiangjie ;
Liu, Xiaozhu ;
Shu, Tingting ;
Fu, Li ;
Deng, Jiewen ;
Chen, Huaqiao ;
Liu, Guojing ;
Jiang, Ying ;
Liu, Ziru .
FRONTIERS IN PUBLIC HEALTH, 2023, 10
[43]   Applications of Support Vector Machine (SVM) Learning in Cancer Genomics [J].
Huang, Shujun ;
Cai, Nianguang ;
Pacheco, Pedro Penzuti ;
Narandes, Shavira ;
Wang, Yang ;
Xu, Wayne .
CANCER GENOMICS & PROTEOMICS, 2018, 15 (01) :41-51
[44]   Artificial neural network-based model enhances risk stratification and reduces non-invasive cardiac stress imaging compared to Diamond-Forrester and Morise risk assessment models: A prospective study [J].
Isma'eel, Hussain A. ;
Sakr, George E. ;
Serhan, Mustapha ;
Lamaa, Nader ;
Hakim, Ayman ;
Cremer, Paul C. ;
Jaber, Wael A. ;
Garabedian, Torkom ;
Elhajj, Imad ;
Abchee, Antoine B. .
JOURNAL OF NUCLEAR CARDIOLOGY, 2018, 25 (05) :1601-1609
[45]  
Javaid M., 2022, Int J Intell Netw, V3, P58, DOI DOI 10.1016/J.IJIN.2022.05.002
[46]   Development and validation of a gradient boosting machine to predict prognosis after liver resection for intrahepatic cholangiocarcinoma [J].
Ji, Gu-Wei ;
Jiao, Chen-Yu ;
Xu, Zheng-Gang ;
Li, Xiang-Cheng ;
Wang, Ke ;
Wang, Xue-Hao .
BMC CANCER, 2022, 22 (01)
[47]   Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms [J].
Jorge, April ;
Castro, Victor M. ;
Barnado, April ;
Gainer, Vivian ;
Hong, Chuan ;
Cai, Tianxi ;
Cai, Tianrun ;
Carroll, Robert ;
Denny, Joshua C. ;
Crofford, Leslie ;
Costenbader, Karen H. ;
Liao, Katherine P. ;
Karlson, Elizabeth W. ;
Feldman, Candace H. .
SEMINARS IN ARTHRITIS AND RHEUMATISM, 2019, 49 (01) :84-90
[48]   Artificial neural network predicts the need for therapeutic ERCP in patients with suspected choledocholithiasis [J].
Jovanovic, Predrag ;
Salkic, Nermin N. ;
Zerem, Enver .
GASTROINTESTINAL ENDOSCOPY, 2014, 80 (02) :260-268
[49]   Recent Advancements in Emerging Technologies for Healthcare Management Systems: A Survey [J].
Junaid, Sahalu Balarabe ;
Imam, Abdullahi Abubakar ;
Balogun, Abdullateef Oluwagbemiga ;
De Silva, Liyanage Chandratilak ;
Surakat, Yusuf Alhaji ;
Kumar, Ganesh ;
Abdulkarim, Muhammad ;
Shuaibu, Aliyu Nuhu ;
Garba, Aliyu ;
Sahalu, Yusra ;
Mohammed, Abdullahi ;
Mohammed, Tanko Yahaya ;
Abdulkadir, Bashir Abubakar ;
Abba, Abdallah Alkali ;
Kakumi, Nana Aliyu Iliyasu ;
Mahamad, Saipunidzam .
HEALTHCARE, 2022, 10 (10)
[50]   Episodic memory and delayed recall are significantly more impaired in younger patients with deficit schizophrenia than in elderly patients with amnestic mild cognitive impairment [J].
Kanchanatawan, Buranee ;
Tangwongchai, Sookjaroen ;
Supasitthumrong, Thitiporn ;
Sriswasdi, Sira ;
Maes, Michael .
PLOS ONE, 2018, 13 (05)