Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models

被引:6
作者
Piccininni, Marco [1 ,2 ,3 ]
Wechsung, Maximilian [4 ]
Van Calster, Ben [5 ,6 ,7 ]
Rohmann, Jessica L. [8 ]
Konigorski, Stefan [1 ,2 ,9 ]
van Smeden, Maarten [10 ]
机构
[1] Hasso Plattner Inst Digital Engn, Digital Hlth & Machine Learning Res Grp, Potsdam, Germany
[2] Univ Potsdam, Digital Engn Fac, Potsdam, Germany
[3] Charite Univ Med Berlin, Inst Publ Hlth, Berlin, Germany
[4] York Univ, Dept Math & Stat, Toronto, ON, Canada
[5] Katholieke Univ Leuven, Dept Dev & Regenerat, Leuven, Belgium
[6] Leiden Univ Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands
[7] Katholieke Univ Leuven, Leuven Unit Hlth Technol Assessment Res LUHTAR, Leuven, Belgium
[8] Charite Univ Med Berlin, Ctr Stroke Res Berlin, Berlin, Germany
[9] Hasso Plattner Inst Digital Hlth Mt Sinai, Icahn Sch Med Mt Sinai, New York, NY USA
[10] Univ Utrecht, UMC Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
基金
英国医学研究理事会;
关键词
Class imbalance; Prediction; Calibration; Discrimination; Undersampling; LOGISTIC-REGRESSION;
D O I
10.1016/j.jbi.2024.104666
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Class imbalance is sometimes considered a problem when developing clinical prediction models and assessing their performance. To address it, correction strategies involving manipulations of the training dataset, such as random undersampling or oversampling, are frequently used. The aim of this article is to illustrate the consequences of these class imbalance correction strategies on clinical prediction models' internal validity in terms of calibration and discrimination performances. Methods: We used both heuristic intuition and formal mathematical reasoning to characterize the relations between conditional probabilities of interest and probabilities targeted when using random undersampling or oversampling. We propose a plug -in estimator that represents a natural correction for predictions obtained from models that have been trained on artificially balanced datasets ("na & iuml;ve" models). We conducted a Monte Carlo simulation with two different data generation processes and present a real-world example using data from the International Stroke Trial database to empirically demonstrate the consequences of applying random resampling techniques for class imbalance correction on calibration and discrimination (in terms of Area Under the ROC, AUC) for logistic regression and tree-based prediction models. Results: Across our simulations and in the real-world example, calibration of the na & iuml;ve models was very poor. The models using the plug -in estimator generally outperformed the models relying on class imbalance correction in terms of calibration while achieving the same discrimination performance. Conclusion: Random resampling techniques for class imbalance correction do not generally improve discrimination performance (i.e., AUC), and their use is hard to justify when aiming at providing calibrated predictions. Improper use of such class imbalance correction techniques can lead to suboptimal data usage and less valid risk prediction models.
引用
收藏
页数:10
相关论文
共 31 条
  • [1] The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models
    Austin, Peter C.
    Steyerberg, Ewout W.
    [J]. STATISTICS IN MEDICINE, 2019, 38 (21) : 4051 - 4065
  • [2] Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers
    Austin, Peter C.
    Steyerberg, Ewout W.
    [J]. STATISTICS IN MEDICINE, 2014, 33 (03) : 517 - 535
  • [3] Current Best Practice for Presenting Probabilities in Patient Decision Aids: Fundamental Principles
    Bonner, Carissa
    Trevena, Lyndal J.
    Gaissmaier, Wolfgang
    Han, Paul K. J.
    Okan, Yasmina
    Ozanne, Elissa
    Peters, Ellen
    Timmermans, Danielle
    Zikmund-Fisher, Brian J.
    [J]. MEDICAL DECISION MAKING, 2021, 41 (07) : 821 - 833
  • [4] The Impact of Undersampling on the Predictive Performance of Logistic Regression and Machine Learning Algorithms A Simulation Study
    Cartus, Abigail R.
    Bodnar, Lisa M.
    Naimi, Ashley I.
    [J]. EPIDEMIOLOGY, 2020, 31 (05) : E42 - E44
  • [5] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [6] A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models
    Christodoulou, Evangelia
    Ma, Jie
    Collins, Gary S.
    Steyerberg, Ewout W.
    Verbakel, Jan Y.
    Van Calster, Ben
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 : 12 - 22
  • [7] Evaluation of clinical prediction models (part 1): from development to external validation
    Collins, Gary S.
    Dhiman, Paula
    Ma, Jie
    Schlussel, Michael M.
    Archer, Lucinda
    Van Calster, Ben
    Harrell Jr, Frank E.
    Martin, Glen P.
    Moons, Karel G. M.
    van Smeden, Maarten
    Sperrin, Matthew
    Bullock, Garrett S.
    Riley, Richard
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2024, 384
  • [8] Interpreting area under the receiver operating characteristic curve
    de Hond, Anne A H
    Steyerberg, Ewout W
    van Calster, Ben
    [J]. The Lancet Digital Health, 2022, 4 (12):
  • [9] Developing a quality criteria framework for patient decision aids: online international Delphi consensus process
    Elwyn, Glyn
    O'Connor, Annette
    Stacey, Dawn
    Volk, Robert
    Edwards, Adrian
    Coulter, Angela
    Thomson, Richard
    Barrat, Alexandra
    Butow, Phyllis
    Barry, Michael
    Mulley, Albert G.
    Sepucha, Karen
    Bernstein, Steven
    Clarke, Aileen
    Entwistle, Vikki
    Feldman-Stewart, Deb
    Holmes-Rovner, Margaret
    Llewellyn-Thomas, Hilary
    Moumjid, Nora
    Ruland, Cornelia
    Sykes, Alan
    Whelan, Tim
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2006, 333 (7565): : 417 - 419
  • [10] Hernn M., 2020, Causal Inference: What If