Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models

被引：6

作者：

Piccininni, Marco ^{[1
,2
,3
]}

Wechsung, Maximilian ^{[4
]}

Van Calster, Ben ^{[5
,6
,7
]}

Rohmann, Jessica L. ^{[8
]}

Konigorski, Stefan ^{[1
,2
,9
]}

van Smeden, Maarten ^{[10
]}

机构：

[1] Hasso Plattner Inst Digital Engn, Digital Hlth & Machine Learning Res Grp, Potsdam, Germany

[2] Univ Potsdam, Digital Engn Fac, Potsdam, Germany

[3] Charite Univ Med Berlin, Inst Publ Hlth, Berlin, Germany

[4] York Univ, Dept Math & Stat, Toronto, ON, Canada

[5] Katholieke Univ Leuven, Dept Dev & Regenerat, Leuven, Belgium

[6] Leiden Univ Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands

[7] Katholieke Univ Leuven, Leuven Unit Hlth Technol Assessment Res LUHTAR, Leuven, Belgium

[8] Charite Univ Med Berlin, Ctr Stroke Res Berlin, Berlin, Germany

[9] Hasso Plattner Inst Digital Hlth Mt Sinai, Icahn Sch Med Mt Sinai, New York, NY USA

[10] Univ Utrecht, UMC Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2024年 / 155卷

基金：

英国医学研究理事会;

关键词：

Class imbalance; Prediction; Calibration; Discrimination; Undersampling; LOGISTIC-REGRESSION;

D O I：

10.1016/j.jbi.2024.104666

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Objective: Class imbalance is sometimes considered a problem when developing clinical prediction models and assessing their performance. To address it, correction strategies involving manipulations of the training dataset, such as random undersampling or oversampling, are frequently used. The aim of this article is to illustrate the consequences of these class imbalance correction strategies on clinical prediction models' internal validity in terms of calibration and discrimination performances. Methods: We used both heuristic intuition and formal mathematical reasoning to characterize the relations between conditional probabilities of interest and probabilities targeted when using random undersampling or oversampling. We propose a plug -in estimator that represents a natural correction for predictions obtained from models that have been trained on artificially balanced datasets ("na & iuml;ve" models). We conducted a Monte Carlo simulation with two different data generation processes and present a real-world example using data from the International Stroke Trial database to empirically demonstrate the consequences of applying random resampling techniques for class imbalance correction on calibration and discrimination (in terms of Area Under the ROC, AUC) for logistic regression and tree-based prediction models. Results: Across our simulations and in the real-world example, calibration of the na & iuml;ve models was very poor. The models using the plug -in estimator generally outperformed the models relying on class imbalance correction in terms of calibration while achieving the same discrimination performance. Conclusion: Random resampling techniques for class imbalance correction do not generally improve discrimination performance (i.e., AUC), and their use is hard to justify when aiming at providing calibrated predictions. Improper use of such class imbalance correction techniques can lead to suboptimal data usage and less valid risk prediction models.

引用

页数：10

共 31 条

[21] Rothman KJ, 2008, Modern epidemiology
[22] Sandercock P, 1997, LANCET, V349, P1569
[23] Sandercock P., 2011, International Stroke Trial database (version 2)
[24] The International Stroke Trial database
Sandercock, Peter A. G.
Niewada, Maciej
Czlonkowska, Anna
[J]. TRIALS, 2011, 12
[25] Steyerberg E.W., 2019, Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating (Statistics for Biology and Health), DOI [DOI 10.1007/978-0-387-77244-8, 10.1007/978-0-387-77244-8]
[26] Towards better clinical prediction models: seven steps for development and an ABCD for validation
Steyerberg, Ewout W.
Vergouwe, Yvonne
[J]. EUROPEAN HEART JOURNAL, 2014, 35 (29) : 1925 - +
[27] Presenting quantitative information about decision outcomes: a risk communication primer for patient decision aid developers
Trevena, Lyndal J.
Zikmund-Fisher, Brian J.
Edwards, Adrian
Gaissmaier, Wolfgang
Galesic, Mirta
Han, Paul K. J.
King, John
Lawson, Margaret L.
Linder, Suzanne K.
Lipkus, Isaac
Ozanne, Elissa
Peters, Ellen
Timmermans, Danielle
Woloshin, Steven
[J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2013, 13
[28] Calibration: the Achilles heel of predictive analytics
van Calster, Ben
McLernon, David J.
van Smeden, Maarten
Wynants, Laure
Steyerberg, Ewout W.
[J]. BMC MEDICINE, 2019, 17 (01)
[29] A calibration hierarchy for risk models was defined: from utopia to empirical data
Van Calster, Ben
Nieboer, Daan
Vergouwe, Yvonne
De Cock, Bavo
Pencina, Michael J.
Steyerberg, Ewout W.
[J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2016, 74 : 167 - 176
[30] The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression
van den Goorbergh, Ruben
van Smeden, Maarten
Timmerman, Dirk
Van Calster, Ben
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2022, 29 (09) : 1525 - 1534

← 1 2 3 4 →