Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: a case study in suicide risk prediction

被引:6
|
作者
Coley, R. Yates [1 ,2 ]
Liao, Qinqing [2 ]
Simon, Noah [1 ,2 ]
Shortreed, Susan M. M. [1 ,2 ]
机构
[1] Kaiser Permanente Washington Hlth Res Inst, 1730 Minor Ave 1600, Seattle, WA 98101 USA
[2] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
基金
美国医疗保健研究与质量局;
关键词
Bootstrap; Clinical prediction; Cross-validation; Machine learning; Optimism; Random forest; Risk stratification; Split-sample; CROSS-VALIDATION; MODELS; REGRESSION; PRECISION;
D O I
10.1186/s12874-023-01844-5
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
BackgroundThere is increasing interest in clinical prediction models for rare outcomes such as suicide, psychiatric hospitalizations, and opioid overdose. Accurate model validation is needed to guide model selection and decisions about whether and how prediction models should be used. Split-sample estimation and validation of clinical prediction models, in which data are divided into training and testing sets, may reduce predictive accuracy and precision of validation. Using all data for estimation and validation increases sample size for both procedures, but validation must account for overfitting, or optimism. Our study compared split-sample and entire-sample methods for estimating and validating a suicide prediction model.MethodsWe compared performance of random forest models estimated in a sample of 9,610,318 mental health visits ("entire-sample") and in a 50% subset ("split-sample") as evaluated in a prospective validation sample of 3,754,137 visits. We assessed optimism of three internal validation approaches: for the split-sample prediction model, validation in the held-out testing set and, for the entire-sample model, cross-validation and bootstrap optimism correction.ResultsThe split-sample and entire-sample prediction models showed similar prospective performance; the area under the curve, AUC, and 95% confidence interval was 0.81 (0.77-0.85) for both. Performance estimates evaluated in the testing set for the split-sample model (AUC = 0.85 [0.82-0.87]) and via cross-validation for the entire-sample model (AUC = 0.83 [0.81-0.85]) accurately reflected prospective performance. Validation of the entire-sample model with bootstrap optimism correction overestimated prospective performance (AUC = 0.88 [0.86-0.89]). Measures of classification accuracy, including sensitivity and positive predictive value at the 99(th), 95(th), 90(th), and 75(th) percentiles of the risk score distribution, indicated similar conclusions: bootstrap optimism correction overestimated classification accuracy in the prospective validation set.ConclusionsWhile previous literature demonstrated the validity of bootstrap optimism correction for parametric models in small samples, this approach did not accurately validate performance of a rare-event prediction model estimated with random forests in a large clinical dataset. Cross-validation of prediction models estimated with all available data provides accurate independent validation while maximizing sample size.
引用
收藏
页数:10
相关论文
共 10 条
  • [1] Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: a case study in suicide risk prediction
    R Yates Coley
    Qinqing Liao
    Noah Simon
    Susan M. Shortreed
    BMC Medical Research Methodology, 23
  • [2] Clinical Prediction from Structural Brain MRI Scans: A Large-Scale Empirical Study
    Sabuncu, Mert R.
    Konukoglu, Ender
    NEUROINFORMATICS, 2015, 13 (01) : 31 - 46
  • [3] Clinical Prediction from Structural Brain MRI Scans: A Large-Scale Empirical Study
    Mert R. Sabuncu
    Ender Konukoglu
    Neuroinformatics, 2015, 13 : 31 - 46
  • [4] Evaluation of Machine Learning Methods on Large-Scale Spatiotemporal Data for Photovoltaic Power Prediction
    Sauter, Evan
    Mughal, Maqsood
    Zhang, Ziming
    ENERGIES, 2023, 16 (13)
  • [5] Large-scale benchmark study of survival prediction methods using multi-omics data
    Herrmann, Moritz
    Probst, Philipp
    Hornung, Roman
    Jurinovic, Vindi
    Boulesteix, Anne-Laure
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [6] A pancreatic cancer risk prediction model (Prism) developed and validated on large-scale US clinical data
    Jia, Kai
    Kundrot, Steven
    Palchuk, Matvey B.
    Warnick, Jeff
    Haapala, Kathryn
    Kaplan, Irving D.
    Rinard, Martin
    Appelbaum, Limor
    EBIOMEDICINE, 2023, 98
  • [7] Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants
    Rifaioglu, Ahmet Sureyya
    Dogan, Tunca
    Sarac, Omer Sinan
    Ersahin, Tulin
    Saidi, Rabie
    Atalay, Mehmet Volkan
    Martin, Maria Jesus
    Cetin-Atalay, Rengul
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 (02) : 135 - 151
  • [8] Empirical Study of Feature Selection Methods in Regression for Large-Scale Healthcare Data: A Case Study on Estimating Dental Expenditures
    Mayya, Veena
    King, Christian
    Vu, Giang T.
    Gurupur, Varadraj
    IEEE ACCESS, 2024, 12 : 153564 - 153579
  • [9] Smart Prediction System for Territorial Resilience at the Large-Scale Level. Case Study of the Seasonal Forest Fires Risk in Northern Morocco
    Mharzi-Alaoui, Hicham
    Thill, Jean-Claude
    Bahi, H.
    Hajji, H.
    Assali, F.
    Moukrim, S.
    6TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS, 2022, 393 : 533 - 547
  • [10] Feasibility and evaluation of a large-scale external validation approach for patient-level prediction in an international data network: validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation
    Reps, Jenna M.
    Williams, Ross D.
    You, Seng Chan
    Falconer, Thomas
    Minty, Evan
    Callahan, Alison
    Ryan, Patrick B.
    Park, Rae Woong
    Lim, Hong-Seok
    Rijnbeek, Peter
    BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)