Logistic regression was as good as machine learning for predicting major chronic diseases

被引:289
作者
Nusinovici, Simon [1 ]
Tham, Yih Chung [1 ,3 ]
Yan, Marco Yu Chak [1 ]
Ting, Daniel Shu Wei [1 ,3 ]
Li, Jialiang [1 ,4 ]
Sabanayagam, Charumathi [1 ,3 ]
Wong, Tien Yin [1 ,2 ,3 ]
Cheng, Ching-Yu [1 ,2 ,3 ]
机构
[1] Singapore Natl Eye Ctr, Singapore Eye Res Inst, Singapore, Singapore
[2] Natl Univ Singapore, Yong Loo Lin Sch Med, Dept Ophthalmol, Singapore, Singapore
[3] Duke NUS Med Sch, Ophthalmol & Visual Sci Acad Clin Programme, Singapore, Singapore
[4] Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore, Singapore
基金
英国医学研究理事会;
关键词
Machine learning; Logistic regression; Prognostic modeling; Chronic diseases; Interaction; Nonlinearity; SINGAPORE MALAY EYE; CONVENTIONAL REGRESSION; CARDIOVASCULAR-DISEASE; RISK PREDICTION; METHODOLOGY; CLASSIFICATION; RATIONALE; PROGNOSIS; MORTALITY; DIAGNOSIS;
D O I
10.1016/j.jclinepi.2020.03.002
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objective: To evaluate the performance of machine learning (ML) algorithms and to compare them with logistic regression for the prediction of risk of cardiovascular diseases (CVDs), chronic kidney disease (CKD), diabetes (DM), and hypertension (HTN) and in a prospective cohort study using simple clinical predictors. Study Design and Setting: We conducted analyses in a population-based cohort study in Asian adults (n = 6,762). Five different ML models were considered-single-hidden-layer neural network, support vector machine, random forest, gradient boosting machine, and k-nearest neighbor-and were compared with standard logistic regression. Results: The incidences at 6 years of CVD, CKD, DM, and HTN cases were 4.0%, 7.0%, 9.2%, and 34.6%, respectively. Logistic regression reached the highest area under the receiver operating characteristic curve for CKD (0.905 [0.88, 0.93]) and DM (0.768 [0.73, 0.81]) predictions. For CVD and HTN, the best models were neural network (0.753 [0.70, 0.81]) and support vector machine (0.780 [0.747, 0.812]), respectively. However, the differences with logistic regression were small (less than 1%) and nonsignificant. Logistic regression, gradient boosting machine, and neural network were systematically ranked among the best models. Conclusion: Logistic regression yields as good performance as ML models to predict the risk of major chronic diseases with low incidence and simple clinical predictors. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:56 / 69
页数:14
相关论文
共 50 条
[41]   Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints [J].
van der Ploeg, Tjeerd ;
Austin, Peter C. ;
Steyerberg, Ewout W. .
BMC MEDICAL RESEARCH METHODOLOGY, 2014, 14
[42]   Prediction of intracranial findings on CT-scans by alternative modelling techniques [J].
van der Ploeg, Tjeerd ;
Smits, Marion ;
Dippel, Diederik W. ;
Hunink, Myriam ;
Steyerberg, Ewout W. .
BMC MEDICAL RESEARCH METHODOLOGY, 2011, 11
[43]   Comparison of imputation methods for missing laboratory data in medicine [J].
Waljee, Akbar K. ;
Mukherjee, Ashin ;
Singal, Amit G. ;
Zhang, Yiwei ;
Warren, Jeffrey ;
Balis, Ulysses ;
Marrero, Jorge ;
Zhu, Ji ;
Higgins, Peter D. R. .
BMJ OPEN, 2013, 3 (08)
[44]   Short-term prediction of mortality in patients with systemic lupus erythematosus: Classification of outcomes using random forests [J].
Ward, MM ;
Pajevic, S ;
Dreyfuss, J ;
Malley, JD .
ARTHRITIS & RHEUMATISM-ARTHRITIS CARE & RESEARCH, 2006, 55 (01) :74-80
[45]   Chronic kidney disease [J].
Webster, Angela C. ;
Nagler, Evi V. ;
Morton, Rachael L. ;
Masson, Philip .
LANCET, 2017, 389 (10075) :1238-1252
[46]   Can machine-learning improve cardiovascular risk prediction using routine clinical data? [J].
Weng, Stephen F. ;
Reps, Jenna ;
Kai, Joe ;
Garibaldi, Jonathan M. ;
Qureshi, Nadeem .
PLOS ONE, 2017, 12 (04)
[47]   Is Corneal Arcus Independently Associated With Incident Cardiovascular Disease in Asians? [J].
Wong, Mark Yu Zheng ;
Man, Ryan Eyn Kidd ;
Gupta, Preeti ;
Lim, Sing Hui ;
Lim, Blanche ;
Tham, Yih-Chung ;
Sabanayagam, Charumathi ;
Wong, Tien Yin ;
Cheng, Ching-Yu ;
Lamoureux, Ecosse Luc .
AMERICAN JOURNAL OF OPHTHALMOLOGY, 2017, 183 :99-106
[48]  
YOUDEN WJ, 1950, BIOMETRICS, V6, P172, DOI 10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO
[49]  
2-3
[50]   Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes [J].
Yu, Wei ;
Liu, Tiebin ;
Valdez, Rodolfo ;
Gwinn, Marta ;
Khoury, Muin J. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10