Modelling the hierarchical structure in datasets with very small clusters: a simulation study to explore the effect of the proportion of clusters when the outcome is continuous

被引:28
作者
Sauzet, O. [1 ]
Wright, K. C.
Marston, L. [2 ]
Brocklehurst, P. [3 ]
Peacock, J. L. [4 ]
机构
[1] Univ Bielefeld, AG Epidemiol & Int Publ Hlth, D-33615 Bielefeld, Germany
[2] UCL, Dept Primary Care & Populat Hlth, London, England
[3] UCL, Inst Womens Hlth, London, England
[4] Kings Coll London, Div Hlth & Social Care Res, London SE1 3QD, England
关键词
non-independent data; small clusters; mixed model; linear regression; simulations; FREQUENCY OSCILLATORY VENTILATION; RANDOMIZED-TRIALS; REGRESSION; TWIN;
D O I
10.1002/sim.5638
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In cluster-randomised trials, the problem of non-independence within clusters is well known, and appropriate statistical analysis documented. Clusters typically seen in cluster trials are large in size and few in number, whereas datasets of preterm infants incorporate clusters of size two (twins), size three (triplets) and so on, with the majority of infants being in clusters' of size one. In such situations, it is unclear whether adjustment for clustering is needed or even possible. In this paper, we compared analyses allowing for clustering (linear mixed model) with analyses ignoring clustering (linear regression). Through simulations based on two real datasets, we explored estimation bias in predictors of a continuous outcome in different size datasets typical of preterm samples, with varying percentages of twins. Overall, the biases for estimated coefficients were similar for linear regression and mixed models, but the standard errors were consistently much less well estimated when using a linear model. Non-convergence was rare but was observed in approximately 5% of mixed models for samples below 200 and percentage of twins 2% or less. We conclude that in datasets with small clusters, mixed models should be the method of choice irrespective of the percentage of twins. If the mixed model does not converge, a linear regression can be fitted, but standard error will be underestimated, and so type I error may be inflated. Copyright (c) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:1429 / 1438
页数:10
相关论文
共 14 条
[1]   Regression models for twin studies: a critical review [J].
Carlin, JB ;
Gurrin, LC ;
Sterne, JAC ;
Morley, R ;
Dwyer, T .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2005, 34 (05) :1089-1099
[2]   How should randomised trials including multiple pregnancies be analysed? [J].
Gates, S ;
Brocklehurst, P .
BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2004, 111 (03) :213-219
[3]   High-frequency oscillatory ventilation for the prevention of chronic lung disease of prematurity [J].
Johnson, AH ;
Peacock, JL ;
Greenough, A ;
Marlow, N ;
Limb, ES ;
Marston, L ;
Calvert, SA .
NEW ENGLAND JOURNAL OF MEDICINE, 2002, 347 (09) :633-642
[4]   The statistical analysis of data from small groups [J].
Kenny, DA ;
Mannetti, L ;
Pierro, A ;
Livi, S ;
Kashy, DA .
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 2002, 83 (01) :126-137
[5]   Statistics notes - Analysis of a trial randomised in clusters [J].
Kerry, SM ;
Bland, JM .
BMJ-BRITISH MEDICAL JOURNAL, 1998, 316 (7124) :54-54
[6]   Analysis of repeated pregnancy outcomes [J].
Louis, GB ;
Dukic, V ;
Heagerty, PJ ;
Louis, TA ;
Lynch, CD ;
Ryan, LM ;
Schisterman, EF ;
Trumble, A .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2006, 15 (02) :103-126
[7]   Randomised trial of high frequency oscillatory ventilation or conventional ventilation in babies of gestational age 28 weeks or less: respiratory and neurological outcomes at 2 years [J].
Marlow, N. ;
Greenough, A. ;
Peacock, J. L. ;
Marston, L. ;
Limb, E. S. ;
Johnson, A. H. ;
Calvert, S. A. .
ARCHIVES OF DISEASE IN CHILDHOOD-FETAL AND NEONATAL EDITION, 2006, 91 (05) :320-326
[8]   Factors affecting vocabulary acquisition at age 2 in children born between 23 and 28 weeks' gestation [J].
Marston, Louise ;
Peacock, Janet L. ;
Calvert, Sandra A. ;
Greenough, Anne ;
Marlow, Neil .
DEVELOPMENTAL MEDICINE AND CHILD NEUROLOGY, 2007, 49 (08) :591-596
[9]   Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets [J].
Marston, Louise ;
Peacock, Janet L. ;
Yu, Keming ;
Brocklehurst, Peter ;
Calvert, Sandra A. ;
Greenough, Anne ;
Marlow, Neil .
PAEDIATRIC AND PERINATAL EPIDEMIOLOGY, 2009, 23 (04) :380-392
[10]   A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies [J].
Moerbeek, M ;
van Breukelen, GJP ;
Berger, MP .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (04) :341-350