Making complex prediction rules applicable for readers: Current practice in random forest literature and recommendations

被引:13
作者
Boulesteix, Anne-Laure [1 ]
Janitza, Silke [1 ]
Hornung, Roman [1 ]
Probst, Philipp [1 ]
Busen, Hannah [1 ]
Hapfelmeier, Alexander [2 ]
机构
[1] Ludwig Maximilians Univ Munchen, Inst Med Informat Proc Biometry & Epidemiol, Marchioninistr 15, D-81377 Munich, Germany
[2] TUM Munich, Inst Med Informat Stat & Epidemiol, Munich, Germany
关键词
logistic regression; machine learning; prediction rule; reproducibility; reproducible research; MACHINE LEARNING-METHODS; PROBABILITY ESTIMATION; REPRODUCIBLE RESEARCH; GUIDELINES;
D O I
10.1002/bimj.201700243
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Ideally, prediction rules should be published in such a way that readers may apply them, for example, to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014-2015 in the field "medical and health sciences", with the aim of identifying issues related to their applicability. Making a prediction rule reproducible is a possible way to ensure that it is applicable; thus reproducibility is also examined in our survey. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as nonresponse of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from this illustrative survey, we formulate a set of recommendations for authors who aim to make complex prediction rules applicable for readers. All data including the description of the considered studies and analysis codes are available as supplementary materials.
引用
收藏
页码:1314 / 1328
页数:15
相关论文
共 29 条
[1]  
[Anonymous], 1998, CLASSIFICATION REGRE
[2]   Machine learning versus statistical modeling [J].
Boulesteix, Anne-Laure ;
Schmid, Matthias .
BIOMETRICAL JOURNAL, 2014, 56 (04) :588-593
[3]   Reporting and Methods in Clinical Prediction Research: A Systematic Review [J].
Bouwmeester, Walter ;
Zuithoff, Nicolaas P. A. ;
Mallett, Susan ;
Geerlings, Mirjam I. ;
Vergouwe, Yvonne ;
Steyerberg, Ewout W. ;
Altman, Douglas G. ;
Moons, Karel G. M. .
PLOS MEDICINE, 2012, 9 (05)
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Statistical modeling: The two cultures [J].
Breiman, L .
STATISTICAL SCIENCE, 2001, 16 (03) :199-215
[6]  
Collins GS, 2015, ANN INTERN MED, V162, P55, DOI [10.1016/j.jclinepi.2014.11.010, 10.1038/bjc.2014.639, 10.1136/bmj.g7594, 10.1016/j.eururo.2014.11.025, 10.7326/M14-0697, 10.1186/s12916-014-0241-z, 10.1002/bjs.9736, 10.7326/M14-0698]
[7]  
Dehghani M., 2017, ARXIV170707605
[8]   Gram-negative and -positive bacteria differentiation in blood culture samples by headspace volatile compound analysis [J].
Dolch, Michael E. ;
Janitza, Silke ;
Boulesteix, Anne-Laure ;
Grassmann-Lichtenauer, Carola ;
Praun, Siegfried ;
Denzer, Wolfgang ;
Schelling, Gustav ;
Schubert, Soeren .
JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2016, 23
[9]   An invitation to reproducible computational research [J].
Donoho, David L. .
BIOSTATISTICS, 2010, 11 (03) :385-388
[10]   Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting [J].
Dupuy, Alain ;
Simon, Richard M. .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2007, 99 (02) :147-157