Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models

被引:6
作者
Piccininni, Marco [1 ,2 ,3 ]
Wechsung, Maximilian [4 ]
Van Calster, Ben [5 ,6 ,7 ]
Rohmann, Jessica L. [8 ]
Konigorski, Stefan [1 ,2 ,9 ]
van Smeden, Maarten [10 ]
机构
[1] Hasso Plattner Inst Digital Engn, Digital Hlth & Machine Learning Res Grp, Potsdam, Germany
[2] Univ Potsdam, Digital Engn Fac, Potsdam, Germany
[3] Charite Univ Med Berlin, Inst Publ Hlth, Berlin, Germany
[4] York Univ, Dept Math & Stat, Toronto, ON, Canada
[5] Katholieke Univ Leuven, Dept Dev & Regenerat, Leuven, Belgium
[6] Leiden Univ Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands
[7] Katholieke Univ Leuven, Leuven Unit Hlth Technol Assessment Res LUHTAR, Leuven, Belgium
[8] Charite Univ Med Berlin, Ctr Stroke Res Berlin, Berlin, Germany
[9] Hasso Plattner Inst Digital Hlth Mt Sinai, Icahn Sch Med Mt Sinai, New York, NY USA
[10] Univ Utrecht, UMC Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
基金
英国医学研究理事会;
关键词
Class imbalance; Prediction; Calibration; Discrimination; Undersampling; LOGISTIC-REGRESSION;
D O I
10.1016/j.jbi.2024.104666
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Class imbalance is sometimes considered a problem when developing clinical prediction models and assessing their performance. To address it, correction strategies involving manipulations of the training dataset, such as random undersampling or oversampling, are frequently used. The aim of this article is to illustrate the consequences of these class imbalance correction strategies on clinical prediction models' internal validity in terms of calibration and discrimination performances. Methods: We used both heuristic intuition and formal mathematical reasoning to characterize the relations between conditional probabilities of interest and probabilities targeted when using random undersampling or oversampling. We propose a plug -in estimator that represents a natural correction for predictions obtained from models that have been trained on artificially balanced datasets ("na & iuml;ve" models). We conducted a Monte Carlo simulation with two different data generation processes and present a real-world example using data from the International Stroke Trial database to empirically demonstrate the consequences of applying random resampling techniques for class imbalance correction on calibration and discrimination (in terms of Area Under the ROC, AUC) for logistic regression and tree-based prediction models. Results: Across our simulations and in the real-world example, calibration of the na & iuml;ve models was very poor. The models using the plug -in estimator generally outperformed the models relying on class imbalance correction in terms of calibration while achieving the same discrimination performance. Conclusion: Random resampling techniques for class imbalance correction do not generally improve discrimination performance (i.e., AUC), and their use is hard to justify when aiming at providing calibrated predictions. Improper use of such class imbalance correction techniques can lead to suboptimal data usage and less valid risk prediction models.
引用
收藏
页数:10
相关论文
共 31 条
  • [21] Rothman KJ, 2008, Modern epidemiology
  • [22] Sandercock P, 1997, LANCET, V349, P1569
  • [23] Sandercock P., 2011, International Stroke Trial database (version 2)
  • [24] The International Stroke Trial database
    Sandercock, Peter A. G.
    Niewada, Maciej
    Czlonkowska, Anna
    [J]. TRIALS, 2011, 12
  • [25] Steyerberg E.W., 2019, Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating (Statistics for Biology and Health), DOI [DOI 10.1007/978-0-387-77244-8, 10.1007/978-0-387-77244-8]
  • [26] Towards better clinical prediction models: seven steps for development and an ABCD for validation
    Steyerberg, Ewout W.
    Vergouwe, Yvonne
    [J]. EUROPEAN HEART JOURNAL, 2014, 35 (29) : 1925 - +
  • [27] Presenting quantitative information about decision outcomes: a risk communication primer for patient decision aid developers
    Trevena, Lyndal J.
    Zikmund-Fisher, Brian J.
    Edwards, Adrian
    Gaissmaier, Wolfgang
    Galesic, Mirta
    Han, Paul K. J.
    King, John
    Lawson, Margaret L.
    Linder, Suzanne K.
    Lipkus, Isaac
    Ozanne, Elissa
    Peters, Ellen
    Timmermans, Danielle
    Woloshin, Steven
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2013, 13
  • [28] Calibration: the Achilles heel of predictive analytics
    van Calster, Ben
    McLernon, David J.
    van Smeden, Maarten
    Wynants, Laure
    Steyerberg, Ewout W.
    [J]. BMC MEDICINE, 2019, 17 (01)
  • [29] A calibration hierarchy for risk models was defined: from utopia to empirical data
    Van Calster, Ben
    Nieboer, Daan
    Vergouwe, Yvonne
    De Cock, Bavo
    Pencina, Michael J.
    Steyerberg, Ewout W.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2016, 74 : 167 - 176
  • [30] The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression
    van den Goorbergh, Ruben
    van Smeden, Maarten
    Timmerman, Dirk
    Van Calster, Ben
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2022, 29 (09) : 1525 - 1534