Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models

被引：6

作者：

Piccininni, Marco ^{[1
,2
,3
]}

Wechsung, Maximilian ^{[4
]}

Van Calster, Ben ^{[5
,6
,7
]}

Rohmann, Jessica L. ^{[8
]}

Konigorski, Stefan ^{[1
,2
,9
]}

van Smeden, Maarten ^{[10
]}

机构：

[1] Hasso Plattner Inst Digital Engn, Digital Hlth & Machine Learning Res Grp, Potsdam, Germany

[2] Univ Potsdam, Digital Engn Fac, Potsdam, Germany

[3] Charite Univ Med Berlin, Inst Publ Hlth, Berlin, Germany

[4] York Univ, Dept Math & Stat, Toronto, ON, Canada

[5] Katholieke Univ Leuven, Dept Dev & Regenerat, Leuven, Belgium

[6] Leiden Univ Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands

[7] Katholieke Univ Leuven, Leuven Unit Hlth Technol Assessment Res LUHTAR, Leuven, Belgium

[8] Charite Univ Med Berlin, Ctr Stroke Res Berlin, Berlin, Germany

[9] Hasso Plattner Inst Digital Hlth Mt Sinai, Icahn Sch Med Mt Sinai, New York, NY USA

[10] Univ Utrecht, UMC Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2024年 / 155卷

基金：

英国医学研究理事会;

关键词：

Class imbalance; Prediction; Calibration; Discrimination; Undersampling; LOGISTIC-REGRESSION;

D O I：

10.1016/j.jbi.2024.104666

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Objective: Class imbalance is sometimes considered a problem when developing clinical prediction models and assessing their performance. To address it, correction strategies involving manipulations of the training dataset, such as random undersampling or oversampling, are frequently used. The aim of this article is to illustrate the consequences of these class imbalance correction strategies on clinical prediction models' internal validity in terms of calibration and discrimination performances. Methods: We used both heuristic intuition and formal mathematical reasoning to characterize the relations between conditional probabilities of interest and probabilities targeted when using random undersampling or oversampling. We propose a plug -in estimator that represents a natural correction for predictions obtained from models that have been trained on artificially balanced datasets ("na & iuml;ve" models). We conducted a Monte Carlo simulation with two different data generation processes and present a real-world example using data from the International Stroke Trial database to empirically demonstrate the consequences of applying random resampling techniques for class imbalance correction on calibration and discrimination (in terms of Area Under the ROC, AUC) for logistic regression and tree-based prediction models. Results: Across our simulations and in the real-world example, calibration of the na & iuml;ve models was very poor. The models using the plug -in estimator generally outperformed the models relying on class imbalance correction in terms of calibration while achieving the same discrimination performance. Conclusion: Random resampling techniques for class imbalance correction do not generally improve discrimination performance (i.e., AUC), and their use is hard to justify when aiming at providing calibrated predictions. Improper use of such class imbalance correction techniques can lead to suboptimal data usage and less valid risk prediction models.

引用

页数：10

共 31 条

[1] The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models
Austin, Peter C.
Steyerberg, Ewout W.
[J]. STATISTICS IN MEDICINE, 2019, 38 (21) : 4051 - 4065
[2] Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers
Austin, Peter C.
Steyerberg, Ewout W.
[J]. STATISTICS IN MEDICINE, 2014, 33 (03) : 517 - 535
[3] Current Best Practice for Presenting Probabilities in Patient Decision Aids: Fundamental Principles
Bonner, Carissa
Trevena, Lyndal J.
Gaissmaier, Wolfgang
Han, Paul K. J.
Okan, Yasmina
Ozanne, Elissa
Peters, Ellen
Timmermans, Danielle
Zikmund-Fisher, Brian J.
[J]. MEDICAL DECISION MAKING, 2021, 41 (07) : 821 - 833
[4] The Impact of Undersampling on the Predictive Performance of Logistic Regression and Machine Learning Algorithms A Simulation Study
Cartus, Abigail R.
Bodnar, Lisa M.
Naimi, Ashley I.
[J]. EPIDEMIOLOGY, 2020, 31 (05) : E42 - E44
[5] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[6] A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models
Christodoulou, Evangelia
Ma, Jie
Collins, Gary S.
Steyerberg, Ewout W.
Verbakel, Jan Y.
Van Calster, Ben
[J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 : 12 - 22
[7] Evaluation of clinical prediction models (part 1): from development to external validation
Collins, Gary S.
Dhiman, Paula
Ma, Jie
Schlussel, Michael M.
Archer, Lucinda
Van Calster, Ben
Harrell Jr, Frank E.
Martin, Glen P.
Moons, Karel G. M.
van Smeden, Maarten
Sperrin, Matthew
Bullock, Garrett S.
Riley, Richard
[J]. BMJ-BRITISH MEDICAL JOURNAL, 2024, 384
[8] Interpreting area under the receiver operating characteristic curve
de Hond, Anne A H
Steyerberg, Ewout W
van Calster, Ben
[J]. The Lancet Digital Health, 2022, 4 (12):
[9] Developing a quality criteria framework for patient decision aids: online international Delphi consensus process
Elwyn, Glyn
O'Connor, Annette
Stacey, Dawn
Volk, Robert
Edwards, Adrian
Coulter, Angela
Thomson, Richard
Barrat, Alexandra
Butow, Phyllis
Barry, Michael
Mulley, Albert G.
Sepucha, Karen
Bernstein, Steven
Clarke, Aileen
Entwistle, Vikki
Feldman-Stewart, Deb
Holmes-Rovner, Margaret
Llewellyn-Thomas, Hilary
Moumjid, Nora
Ruland, Cornelia
Sykes, Alan
Whelan, Tim
[J]. BMJ-BRITISH MEDICAL JOURNAL, 2006, 333 (7565): : 417 - 419
[10] Hernn M., 2020, Causal Inference: What If

← 1 2 3 4 →