Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review

被引:211
作者
Navarro, Constanza L. Andaur [1 ,2 ]
Damen, Johanna A. A. [1 ,2 ]
Takada, Toshihiko [1 ]
Nijman, Steven W. J. [1 ]
Dhiman, Paula [3 ,4 ]
Ma, Jie [3 ]
Collins, Gary S. [3 ,4 ]
Bajpai, Ram [5 ]
Riley, Richard [5 ]
Moons, Karel G. M. [1 ,2 ]
Hooft, Lotty [1 ,2 ]
机构
[1] Univ Utrecht, Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
[2] Univ Utrecht, Univ Med Ctr Utrecht, Cochrane Netherlands, Utrecht, Netherlands
[3] Univ Oxford, Nuffield Dept Orthopaed Rheumatol & Musculoskelet, Ctr Stat Med, Oxford, England
[4] Oxford Univ Hosp NHS Fdn Trust, NIHR Oxford Biomed Res Ctr, Oxford, England
[5] Keele Univ, Sch Med, Ctr Prognosis Res, Keele, Staffs, England
来源
BMJ-BRITISH MEDICAL JOURNAL | 2021年 / 375卷
关键词
APPLICABILITY; PERFORMANCE; PROBAST; EVENTS; TOOL;
D O I
10.1136/bmj.n2281
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
OBJECTIVE To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties. DESIGN Systematic review. DATA SOURCES PubMed from 1 January 2018 to 31 December 2019. ELIGIBILITY CRITERIA Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes. REVIEW METHODS Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall). RESULTS 152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively. CONCLUSION Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice. SYSTEMATIC REVIEW REGISTRATION PROSPERO CRD42019161764.
引用
收藏
页数:9
相关论文
共 38 条
[1]   Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices [J].
Abramoff, Michael D. ;
Lavin, Philip T. ;
Birch, Michele ;
Shah, Nilay ;
Folk, James C. .
NPJ DIGITAL MEDICINE, 2018, 1
[2]   Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models [J].
Austin, Peter C. ;
Steyerberg, Ewout W. .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2017, 26 (02) :796-808
[3]   What is Machine Learning? A Primer for the Epidemiologist [J].
Bi, Qifang ;
Goodman, Katherine E. ;
Kaminsky, Joshua ;
Lessler, Justin .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2019, 188 (12) :2222-2239
[4]   Reporting and Methods in Clinical Prediction Research: A Systematic Review [J].
Bouwmeester, Walter ;
Zuithoff, Nicolaas P. A. ;
Mallett, Susan ;
Geerlings, Mirjam I. ;
Vergouwe, Yvonne ;
Steyerberg, Ewout W. ;
Altman, Douglas G. ;
Moons, Karel G. M. .
PLOS MEDICINE, 2012, 9 (05)
[5]   Clinical Research Machine Learning Compared With Conventional Statistical Models for Predicting Myocardial Infarction Readmission and Mortality: A Systematic Review [J].
Cho, Sung Min ;
Austin, Peter C. ;
Ross, Heather J. ;
Abdel-Qadir, Husam ;
Chicco, Davide ;
Tomlinson, George ;
Taheri, Cameron ;
Foroutan, Farid ;
Lawler, Patrick R. ;
Billia, Filio ;
Gramolini, Anthony ;
Epelman, Slava ;
Wang, Bo ;
Lee, Douglas S. .
CANADIAN JOURNAL OF CARDIOLOGY, 2021, 37 (08) :1207-1214
[6]   A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models [J].
Christodoulou, Evangelia ;
Ma, Jie ;
Collins, Gary S. ;
Steyerberg, Ewout W. ;
Verbakel, Jan Y. ;
Van Calster, Ben .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 :12-22
[7]   Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence [J].
Collins, Gary S. ;
Dhiman, Paula ;
Andaur Navarro, Constanza L. ;
Ma, Ji ;
Hooft, Lotty ;
Reitsma, Johannes B. ;
Logullo, Patricia ;
Beam, Andrew L. ;
Peng, Lily ;
Van Calster, Ben ;
van Smeden, Maarten ;
Riley, Richard D. ;
Moons, Karel G. M. .
BMJ OPEN, 2021, 11 (07)
[8]   Reporting of artificial intelligence prediction models [J].
Collins, Gary S. ;
Moons, Karel G. M. .
LANCET, 2019, 393 (10181) :1577-1579
[9]  
Collins GS, 2015, J CLIN EPIDEMIOL, V68, P112, DOI [10.7326/M14-0697, 10.1002/bjs.9736, 10.7326/M14-0698, 10.1016/j.jclinepi.2014.11.010, 10.1111/eci.12376, 10.1038/bjc.2014.639, 10.1186/s12916-014-0241-z, 10.1136/bmj.g7594, 10.1016/j.eururo.2014.11.025]
[10]   External validation of multivariable prediction models: a systematic review of methodological conduct and reporting [J].
Collins, Gary S. ;
de Groot, Joris A. ;
Dutton, Susan ;
Omar, Omar ;
Shanyinde, Milensu ;
Tajar, Abdelouahid ;
Voysey, Merryn ;
Wharton, Rose ;
Yu, Ly-Mee ;
Moons, Karel G. ;
Altman, Douglas G. .
BMC MEDICAL RESEARCH METHODOLOGY, 2014, 14