Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

被引:66
|
作者
Snell, Kym I. E. [1 ]
Ensor, Joie [1 ]
Debray, Thomas P. A. [2 ,3 ]
Moons, Karel G. M. [2 ,3 ]
Riley, Richard D. [1 ]
机构
[1] Keele Univ, Res Inst Primary Care & Hlth Sci, Keele, Staffs, England
[2] Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
[3] Univ Med Ctr Utrecht, Cochrane Netherlands, Utrecht, Netherlands
基金
英国医学研究理事会;
关键词
Validation; performance statistics; C-statistic; discrimination; calibration; meta-analysis; between-study distribution; heterogeneity; simulation; INDIVIDUAL PARTICIPANT DATA; EXTERNAL VALIDATION; RISK MODELS; CONFIDENCE; SPECIFICITY; SENSITIVITY; INTERVALS; VALIDITY; AREA;
D O I
10.1177/0962280217705678
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
引用
收藏
页码:3505 / 3522
页数:18
相关论文
共 1 条
  • [1] Alternative Measures of Between-Study Heterogeneity in Meta-Analysis: Reducing the Impact of Outlying Studies
    Lin, Lifeng
    Chu, Haitao
    Hodges, James S.
    BIOMETRICS, 2017, 73 (01) : 156 - 166