Sparse data bias: a problem hiding in plain sight

被引:686
作者
Greenland, Sander [1 ,2 ]
Mansournia, Mohammad Ali [3 ]
Altman, Douglas G. [4 ]
机构
[1] Univ Calif Los Angeles, Dept Epidemiol, Los Angeles, CA USA
[2] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA USA
[3] Univ Tehran Med Sci, Sch Publ Hlth, Dept Epidemiol & Biostat, POB 14155-6446, Tehran, Iran
[4] Univ Oxford, Nuffield Dept Orthopaed Rheumatol & Musculoskelet, Ctr Stat Med, Oxford, England
来源
BMJ-BRITISH MEDICAL JOURNAL | 2016年 / 353卷
关键词
LOGISTIC-REGRESSION; SELECTION; MODEL; LIKELIHOOD; SIMULATION; REDUCTION; EVENTS; IMPACT; RISK;
D O I
10.1136/bmj.i1981
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Effects of treatment or other exposure on outcome events are commonly measured by ratios of risks, rates, or odds. Adjusted versions of these measures are usually estimated by maximum likelihood regression (eg, logistic, Poisson, or Cox modelling). But resulting estimates of effect measures can have serious bias when the data lack adequate case numbers for some combination of exposure and outcome levels. This bias can occur even in quite large datasets and is hence often termed sparse data bias. The bias can arise or be worsened by regression adjustment for potentially confounding variables; in the extreme, the resulting estimates could be impossibly huge or even infinite values that are meaningless artefacts of data sparsity. Such estimate inflation might be obvious in light of background information, but is rarely noted let alone accounted for in research reports. We outline simple methods for detecting and dealing with the problem focusing especially on penalised estimation, which can be easily performed with common software packages.
引用
收藏
页数:6
相关论文
共 33 条
[1]   Predictors of Intensive Care Unit Admission After Total Joint Arthroplasty [J].
AbdelSalam, Hossam ;
Restrepo, Camilo ;
Tarity, T. David ;
Sangster, William ;
Parvizi, Javad .
JOURNAL OF ARTHROPLASTY, 2012, 27 (05) :720-725
[2]  
Agresti A., 2003, CATEGORICAL DATA ANA
[3]  
[Anonymous], 2016, WHY PROPENSITY SCORE
[4]  
Clayton D, 1993, STATISTICAL MODELS I
[5]   Approximate Bayesian logistic regression via penalized likelihood by data augmentation [J].
Discacciati, Andrea ;
Orsini, Nicola ;
Greenland, Sander .
Stata Journal, 2015, 15 (03) :712-736
[6]   BIAS REDUCTION OF MAXIMUM-LIKELIHOOD-ESTIMATES [J].
FIRTH, D .
BIOMETRIKA, 1993, 80 (01) :27-38
[7]  
Glymour M.M., 2008, Modern epidemiology, V3, P183
[8]   CONTROL OF CONFOUNDING IN THE ASSESSMENT OF MEDICAL TECHNOLOGY [J].
GREENLAND, S ;
NEUTRA, R .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 1980, 9 (04) :361-367
[9]   CAUTIONS IN THE USE OF PRELIMINARY-TEST ESTIMATORS - COMMENT [J].
GREENLAND, S .
STATISTICS IN MEDICINE, 1989, 8 (06) :669-673
[10]  
Greenland S, 1999, STAT SCI, V14, P29