A note on estimating the Cox-Snell R2 from a reported C statistic (AUROC) to inform sample size calculations for developing a prediction model with a binary outcome

被引：32

作者：

Riley, Richard D. ^{[1
]}

Van Calster, Ben ^{[2
,3
]}

Collins, Gary S. ^{[4
,5
]}

机构：

[1] Keele Univ, Ctr Prognosis Res, Sch Med, Keele ST5 5BG, Staffs, England

[2] Katholieke Univ Leuven, Dept Dev & Regenerat, Leuven, Belgium

[3] Leiden Univ, Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands

[4] Univ Oxford, Ctr Stat Med, Nuffield Dept Orthopaed Rheumatol & Musculoskelet, Oxford, England

[5] John Radcliffe Hosp, NIHR Oxford Biomed Res Ctr, Oxford, England

来源：

STATISTICS IN MEDICINE | 2021年 / 40卷 / 04期

关键词：

clinical prediction model; C statistic (AUROC); R squared; sample size;

D O I：

10.1002/sim.8806

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

In 2019 we published a pair of articles in Statistics in Medicine that describe how to calculate the minimum sample size for developing a multivariable prediction model with a continuous outcome, or with a binary or time-to-event outcome. As for any sample size calculation, the approach requires the user to specify anticipated values for key parameters. In particular, for a prediction model with a binary outcome, the outcome proportion and a conservative estimate for the overall fit of the developed model as measured by the Cox-Snell R-2 (proportion of variance explained) must be specified. This proposal raises the question of how to identify a plausible value for R-2 in advance of model development. Our articles suggest researchers should identify R-2 from closely related models already published in their field. In this letter, we present details on how to derive R-2 using the reported C statistic (AUROC) for such existing prediction models with a binary outcome. The C statistic is commonly reported, and so our approach allows researchers to obtain R-2 for subsequent sample size calculations for new models. Stata and R code is provided, and a small simulation study.

引用

页码：859 / 864

页数：6

共 13 条

[1] Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable
Austin, Peter C.
Steyerberg, Ewout W.
[J]. BMC MEDICAL RESEARCH METHODOLOGY, 2012, 12
[2] Collins GS, 2015, J CLIN EPIDEMIOL, V68, P112, DOI [10.1038/bjc.2014.639, 10.7326/M14-0697, 10.1186/s12916-014-0241-z, 10.1002/bjs.9736, 10.7326/M14-0698, 10.1016/j.jclinepi.2014.11.010, 10.1136/bmj.g7594, 10.1016/j.eururo.2014.11.025]
[3] Ensor J, 2020, PMSAMPSIZE CALCULATE
[4] Discrimination-based sample size calculations for multivariable prognostic models for time-to-event data
Jinks, Rachel C.
Royston, Patrick
Parmar, Mahesh K. B.
[J]. BMC MEDICAL RESEARCH METHODOLOGY, 2015, 15
[5] R2 MEASURES BASED ON WALD AND LIKELIHOOD RATIO JOINT SIGNIFICANCE TESTS
MAGEE, L
[J]. AMERICAN STATISTICIAN, 1990, 44 (03) : 250 - 253
[6] Understanding increments in model performance metrics
Pencina, Michael J.
D'Agostino, Ralph B.
Massaro, Joseph M.
[J]. LIFETIME DATA ANALYSIS, 2013, 19 (02) : 202 - 218
[7] Calculating the sample size required for developing a clinical prediction model
Riley, Richard D.
Ensor, Joie
Snell, Kym I. E.
Harrell, Frank E., Jr.
Martin, Glen P.
Reitsma, Johannes B.
Moons, Karel G. M.
Collins, Gary
van Smeden, Maarten
[J]. BMJ-BRITISH MEDICAL JOURNAL, 2020, 368
[8] Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes
Riley, Richard D.
Snell, Kym I. E.
Ensor, Joie
Burke, Danielle L.
Harrell, Frank E., Jr.
Moons, Karel G. M.
Collins, Gary S.
[J]. STATISTICS IN MEDICINE, 2019, 38 (07) : 1276 - 1296
[9] Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes
Riley, Richard D.
Snell, Kym I. E.
Ensor, Joie
Burke, Danielle L.
Harrell, Frank E., Jr.
Moons, Karel G. M.
Collins, Gary S.
[J]. STATISTICS IN MEDICINE, 2019, 38 (07) : 1262 - 1275
[10] Snell E., 1989, The analysis of binary data

← 1 2 →