Minimum sample size for developing a multivariable prediction model using multinomial logistic regression

被引:76
作者
Pate, Alexander [1 ,10 ]
Riley, Richard D. [2 ]
Collins, Gary S. [3 ,4 ]
van Smeden, Maarten [5 ,6 ]
Van Calster, Ben [7 ,8 ,9 ]
Ensor, Joie [2 ]
Martin, Glen P. [1 ]
机构
[1] Univ Manchester, Fac Biol Med & Hlth, Manchester Acad Hlth Sci Ctr, Div Informat Imaging & Data Sci, Manchester, England
[2] Keele Univ, Ctr Prognosis Res, Sch Med, Keele, Staffs, England
[3] Univ Oxford, Ctr Stat Med, Nuffield Dept Orthopaed Rheumatol & Musculoskeleta, Oxford, England
[4] John Radcliffe Hosp, NIHR Oxford Biomed Res Ctr, Oxford, England
[5] Univ Utrecht, Univ Med Ctr Utrecht, Julius Ctr Hlth Sci, Utrecht, Netherlands
[6] Leiden Univ Med Ctr, Dept Clin Epidemiol, Leiden, Netherlands
[7] Katholieke Univ Leuven, Dept Dev & Regenerat, Leuven, Belgium
[8] Leiden Univ Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands
[9] Katholieke Univ Leuven, EPI Ctr, Leuven, Belgium
[10] Univ Manchester, Manchester M13 9GB, England
关键词
Clinical prediction models; sample size; multinomial logistic regression; shrinkage; SIMULTANEOUS CONFIDENCE-INTERVALS; DIAGNOSIS; PERFORMANCE;
D O I
10.1177/09622802231151220
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Aims Multinomial logistic regression models allow one to predict the risk of a categorical outcome with > 2 categories. When developing such a model, researchers should ensure the number of participants ( n ) is appropriate relative to the number of events ( E k ) and the number of predictor parameters ( p k ) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes.Proposed criteria The first criterion aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R-2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R-2 of distinct 'one-to-one' logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R(2 )of the multinomial logistic regression.Evaluation of criteria We tested the performance of the proposed criteria (i) through a simulation study and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) were natural extensions from previously proposed criteria for binary outcomes and did not require evaluation through simulation.We illustrated how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.
引用
收藏
页码:555 / 571
页数:17
相关论文
共 42 条
[1]  
Agresti A, 2003, Categorical Data Analysis, DOI 10.1002/0471249688
[2]  
[Anonymous], 1989, Analysis of Binary Data
[3]   Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable [J].
Austin, Peter C. ;
Steyerberg, Ewout W. .
BMC MEDICAL RESEARCH METHODOLOGY, 2012, 12
[4]   Prediction of Recovery, Dependence or Death in Elders Who Become Disabled During Hospitalization [J].
Barnes, Deborah E. ;
Mehta, Kala M. ;
Boscardin, W. John ;
Fortinsky, Richard H. ;
Palmer, Robert M. ;
Kirby, Katharine A. ;
Landefeld, C. Seth .
JOURNAL OF GENERAL INTERNAL MEDICINE, 2013, 28 (02) :261-268
[5]   CALCULATION OF POLYCHOTOMOUS LOGISTIC-REGRESSION PARAMETERS USING INDIVIDUALIZED REGRESSIONS [J].
BEGG, CB ;
GRAY, R .
BIOMETRIKA, 1984, 71 (01) :11-18
[6]   Polytomous logistic regression analysis could be applied more often in diagnostic research [J].
Biesheuvel, C. J. ;
Vergouwe, Y. ;
Steyerberg, E. W. ;
Grobbee, D. E. ;
Moons, K. G. M. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2008, 61 (02) :125-134
[7]  
Collins GS, 2015, J CLIN EPIDEMIOL, V68, P112, DOI [10.1016/j.jclinepi.2014.11.010, 10.1111/eci.12376, 10.7326/M14-0697, 10.7326/M14-0698, 10.1016/j.eururo.2014.11.025, 10.1002/bjs.9736, 10.1136/bmj.g7594, 10.1186/s12916-014-0241-z, 10.1038/bjc.2014.639]
[8]   Sample size considerations and predictive performance of multinomial logistic prediction models [J].
de Jong, Valentijn M. T. ;
Eijkemans, Marinus J. C. ;
van Calster, Ben ;
Timmerman, Dirk ;
Moons, Karel G. M. ;
Steyerberg, Ewout W. ;
van Smeden, Maarten .
STATISTICS IN MEDICINE, 2019, 38 (09) :1601-1619
[9]  
Ensor J., 2020, PMSAMPSIZE CALCULATE
[10]  
Ensor J., PMSAMPSIZE STATA MOD