A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models

被引:1177
作者
Christodoulou, Evangelia [1 ]
Ma, Jie [2 ]
Collins, Gary S. [2 ,3 ]
Steyerberg, Ewout W. [4 ]
Verbakel, Jan Y. [1 ,5 ,6 ]
Van Calster, Ben [1 ,4 ]
机构
[1] Katholieke Univ Leuven, Dept Dev & Regenerat, Herestr 49 Box 805, B-3000 Leuven, Belgium
[2] Univ Oxford, Ctr Stat Med, Nuffield Dept Orthopaed Rheumatol & Musculoskelet, Botnar Res Ctr, Windmill Rd, Oxford OX3 7LD, England
[3] Oxford Univ Hosp NHS Fdn Trust, Oxford, England
[4] Leiden Univ, Med Ctr, Dept Biomed Data Sci, Albinusdreef 2, NL-2333 ZA Leiden, Netherlands
[5] Katholieke Univ Leuven, Dept Publ Hlth & Primary Care, Kapucijnenvoer 33J Box 7001, B-3000 Leuven, Belgium
[6] Univ Oxford, Nuffield Dept Primary Care Hlth Sci, Woodstock Rd, Oxford OX2 6GG, England
关键词
Clinical prediction models; Logistic regression; Machine learning; AUC; Calibration; Reporting; ARTIFICIAL NEURAL-NETWORKS; BIG DATA; EMERGENCY-DEPARTMENT; VEIN-THROMBOSIS; RISK; CANCER; FAILURE; CLASSIFICATION; DIAGNOSIS; MORTALITY;
D O I
10.1016/j.jclinepi.2019.02.004
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives: The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. Study Design and Setting: We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. Results: We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. Conclusion: We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:12 / 22
页数:11
相关论文
共 113 条
[1]   Use of a machine learning framework to predict substance use disorder treatment success [J].
Acion, Laura ;
Kelmansky, Diana ;
van der Laan, Mark ;
Sahker, Ethan ;
Jones, DeShauna ;
Arndt, Stephan .
PLOS ONE, 2017, 12 (04)
[2]   Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project [J].
Alghamdi, Manal ;
Al-Mallah, Mouaz ;
Keteylan, Steven ;
Brawner, Clinton ;
Ehrman, Jonathan ;
Sakr, Sherif .
PLOS ONE, 2017, 12 (07)
[3]   A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis [J].
Allyn, Jerome ;
Allou, Nicolas ;
Augustin, Pascal ;
Philip, Ivan ;
Martinet, Olivier ;
Belghiti, Myriem ;
Provenchere, Sophie ;
Montravers, Philippe ;
Ferdynus, Cyril .
PLOS ONE, 2017, 12 (01)
[4]  
Amini Payam, 2017, Osong Public Health Res Perspect, V8, P195, DOI 10.24171/j.phrp.2017.8.3.06
[5]   Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study [J].
Anderson, Ariana E. ;
Kerr, Wesley T. ;
Thames, April ;
Li, Tong ;
Xiao, Jiayang ;
Cohen, Mark S. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 60 :162-168
[6]  
[Anonymous], MED J ISLAM REPUB IR
[7]  
[Anonymous], STAT METHODS MED RES
[8]  
[Anonymous], ARXIV180209596
[9]   Different medical data mining approaches based prediction of ischemic stroke [J].
Arslan, Ahmet Kadir ;
Colak, Cemil ;
Sarihan, Mehmet Ediz .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 130 :87-92
[10]   Validating the Usefulness of the "Random Forests" Classifier to Diagnose Early Glaucoma With Optical Coherence Tomography [J].
Asaoka, Ryo ;
Hirasawa, Kazunori ;
Iwase, Aiko ;
Fujino, Yuri ;
Murata, Hiroshi ;
Shoji, Nobuyuki ;
Araie, Makoto .
AMERICAN JOURNAL OF OPHTHALMOLOGY, 2017, 174 :95-103