Calibration: the Achilles heel of predictive analytics

被引：1007

作者：

van Calster, Ben ^{[1
,2
]}

McLernon, David J. ^{[3
]}

van Smeden, Maarten ^{[2
,4
]}

Wynants, Laure ^{[1
,5
]}

Steyerberg, Ewout W. ^{[2
]}

机构：

[1] Katholieke Univ Leuven, Dept Dev & Regenerat, Herestr 49 Box 805, B-3000 Leuven, Belgium

[2] Leiden Univ, Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands

[3] Univ Aberdeen, Med Stat Team, Inst Appl Hlth Sci, Sch Med Med Sci & Nutr, Aberdeen, Scotland

[4] Leiden Univ, Med Ctr, Dept Clin Epidemiol, Leiden, Netherlands

[5] Maastricht Univ, Dept Epidemiol, CAPHRI Care & Publ Hlth Res Inst, Maastricht, Netherlands

来源：

BMC MEDICINE | 2019年 / 17卷 / 01期

基金：

比利时弗兰德研究基金会;

关键词：

Calibration; Risk prediction models; Predictive analytics; Overfitting; Heterogeneity; Model performance; LOGISTIC-REGRESSION MODELS; OVARIAN-CANCER; VALIDATION; IMPACT; IVF;

D O I：

10.1186/s12916-019-1466-7

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Background: The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention. Main text: Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice. Conclusion: Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.

引用

页数：7

共 38 条

[1]

[Anonymous], 2009, CLIN PREDICTION MODE

[2]

[Anonymous], 2012, BRIT MED J, DOI DOI 10.1136/BMJ.E4181

[3]

[Anonymous], 2017, CIRC-CARDIOVASC QUAL, DOI DOI 10.1161/CIRCOUTCOMES.10.SUPPL_3.130.A130-A130

[4]

[Anonymous], VARIABILITY REGRESSI

[5] Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers [J].

Austin, Peter C. ;

Steyerberg, Ewout W. .

STATISTICS IN MEDICINE, 2014, 33 (03) :517-535

[6] Reporting and Methods in Clinical Prediction Research: A Systematic Review [J].

Bouwmeester, Walter ;

Zuithoff, Nicolaas P. A. ;

Mallett, Susan ;

Geerlings, Mirjam I. ;

Vergouwe, Yvonne ;

Steyerberg, Ewout W. ;

Altman, Douglas G. ;

Moons, Karel G. M. .

PLOS MEDICINE, 2012, 9 (05)

[7] A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models [J].

Christodoulou, Evangelia ;

Ma, Jie ;

Collins, Gary S. ;

Steyerberg, Ewout W. ;

Verbakel, Jan Y. ;

Van Calster, Ben .

JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 :12-22

[8] External validation of multivariable prediction models: a systematic review of methodological conduct and reporting [J].

Collins, Gary S. ;

de Groot, Joris A. ;

Dutton, Susan ;

Omar, Omar ;

Shanyinde, Milensu ;

Tajar, Abdelouahid ;

Voysey, Merryn ;

Wharton, Rose ;

Yu, Ly-Mee ;

Moons, Karel G. ;

Altman, Douglas G. .

BMC MEDICAL RESEARCH METHODOLOGY, 2014, 14

[9] Calibration drift in regression and machine learning models for acute kidney injury [J].

Davis, Sharon E. ;

Lasko, Thomas A. ;

Chen, Guanhua ;

Siew, Edward D. ;

Matheny, Michael E. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (06) :1052-1061

[10] Predicting the chance of live birth for women undergoing IVF: a novel pretreatment counselling tool [J].

Dhillon, R. K. ;

McLernon, D. J. ;

Smith, P. P. ;

Fishel, S. ;

Dowell, K. ;

Deeks, J. J. ;

Bhattacharya, S. ;

Coomarasamy, A. .

HUMAN REPRODUCTION, 2016, 31 (01) :84-92

← 1 2 3 4 →