Beyond discrimination: A comparison of calibration methods and clinical usefulness of predictive models of readmission risk

被引:56
作者
Walsh, Colin G. [1 ,2 ,3 ]
Sharma, Kavya [1 ]
Hripcsak, George [4 ]
机构
[1] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, Nashville, TN 37205 USA
[2] Vanderbilt Univ, Med Ctr, Dept Med, Nashville, TN 37205 USA
[3] Vanderbilt Univ, Med Ctr, Dept Psychiat, Nashville, TN 37205 USA
[4] Columbia Univ, Dept Biomed Informat, New York, NY 10027 USA
关键词
Readmissions; Predictive analytics; Calibration; Utility analysis; Clinical usefulness; LOGISTIC-REGRESSION MODELS; HOSPITAL READMISSION; 30-DAY READMISSION; IMPROVEMENT; SELECTION; DISEASE; COST;
D O I
10.1016/j.jbi.2017.10.008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Prior to implementing predictive models in novel settings, analyses of calibration and clinical usefulness remain as important as discrimination, but they are not frequently discussed. Calibration is a model's reflection of actual outcome prevalence in its predictions. Clinical usefulness refers to the utilities, costs, and harms of using a predictive model in practice. A decision analytic approach to calibrating and selecting an optimal intervention threshold may help maximize the impact of readmission risk and other preventive interventions. Objectives: To select a pragmatic means of calibrating predictive models that requires a minimum amount of validation data and that performs well in practice. To evaluate the impact of miscalibration on utility and cost via clinical usefulness analyses. Materials and methods: Observational, retrospective cohort study with electronic health record data from 120,000 inpatient admissions at an urban, academic center in Manhattan. The primary outcome was thirty-day readmission for three causes: all-cause, congestive heart failure, and chronic coronary atherosclerotic disease. Predictive modeling was performed via L1-regularized logistic regression. Calibration methods were compared including Platt Scaling, Logistic Calibration, and Prevalence Adjustment. Performance of predictive modeling and calibration was assessed via discrimination (c-statistic), calibration (Spiegelhalter Z-statistic, Root Mean Square Error [RMSE] of binned predictions, Sanders and Murphy Resolutions of the Brier Score, Calibration Slope and Intercept), and clinical usefulness (utility terms represented as costs). The amount of validation data necessary to apply each calibration algorithm was also assessed. Results: C-statistics by diagnosis ranged from 0.7 for all-cause readmission to 0.86 (0.78-0.93) for congestive heart failure. Logistic Calibration and Platt Scaling performed best and this difference required analyzing multiple metrics of calibration simultaneously, in particular Calibration Slopes and Intercepts. Clinical usefulness analyses provided optimal risk thresholds, which varied by reason for readmission, outcome prevalence, and calibration algorithm. Utility analyses also suggested maximum tolerable intervention costs, e.g., $1720 for all-cause readmissions based on a published cost of readmission of $11,862. Conclusions: Choice of calibration method depends on availability of validation data and on performance. Improperly calibrated models may contribute to higher costs of intervention as measured via clinical usefulness. Decision-makers must understand underlying utilities or costs inherent in the use-case at hand to assess usefulness and will obtain the optimal risk threshold to trigger intervention with intervention cost limits as a result.
引用
收藏
页码:9 / 18
页数:10
相关论文
共 52 条
  • [1] Risk factors for hospital readmission in patients with chronic obstructive pulmonary disease
    Almagro, Pedro
    Barreiro, Bienvenido
    Ochoa de Echaguen, Anna
    Quintana, Salvador
    Carballeira, Mnica Rodriguez
    Heredia, Jose L.
    Garau, Javier
    [J]. RESPIRATION, 2006, 73 (03) : 311 - 317
  • [2] Allocating scarce resources in real-time to reduce heart failure readmissions: a prospective, controlled study
    Amarasingham, Ruben
    Patel, Parag C.
    Toto, Kathleen
    Nelson, Lauren L.
    Swanson, Timothy S.
    Moore, Billy J.
    Xie, Bin
    Zhang, Song
    Alvarez, Kristin S.
    Ma, Ying
    Drazner, Mark H.
    Kollipara, Usha
    Halm, Ethan A.
    [J]. BMJ QUALITY & SAFETY, 2013, 22 (12) : 998 - 1005
  • [3] [Anonymous], J AM MED INFORM ASS
  • [4] Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers
    Austin, Peter C.
    Steyerberg, Ewout W.
    [J]. STATISTICS IN MEDICINE, 2014, 33 (03) : 517 - 535
  • [5] Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?
    Austin, Peter C.
    Lee, Douglas S.
    Steyerberg, Ewout W.
    Tu, Jack V.
    [J]. BIOMETRICAL JOURNAL, 2012, 54 (05) : 657 - 673
  • [6] Evaluating a New Marker for Risk Prediction Using the Test Tradeoff: An Update
    Baker, Stuart G.
    Van Calster, Ben
    Steyerberg, Ewout W.
    [J]. INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2012, 8 (01)
  • [7] Berkowitz S. A., 2013, J HOSP MED
  • [8] Case finding for patients at risk of readmission to hospital: development of algorithm to identify high risk patients
    Billings, John
    Dixon, Jennifer
    Mijanovich, Tod
    Wennberg, David
    [J]. BRITISH MEDICAL JOURNAL, 2006, 333 (7563): : 327 - 330
  • [9] Prediction models for clustered data: comparison of a random intercept and standard regression model
    Bouwmeester, Walter
    Twisk, Jos W. R.
    Kappen, Teus H.
    van Klei, Wilton A.
    Moons, Karel G. M.
    Vergouwe, Yvonne
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2013, 13
  • [10] Brown J, 2013, CANNIBALISM IN LITERATURE AND FILM, P1