The consequences of checking for zero-inflation and overdispersion in the analysis of count data

被引：36

作者：

Campbell, Harlan ^{[1
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

来源：

METHODS IN ECOLOGY AND EVOLUTION | 2021年 / 12卷 / 04期

关键词：

model selection bias; overdispersion; zero‐ inflated models; inflation; POISSON REGRESSION-MODEL; LIKELIHOOD RATIO; SCORE TEST; SELECTION; INFERENCE; ECOLOGY; TESTS; ASSUMPTIONS;

D O I：

10.1111/2041-210X.13559

中图分类号：

Q14 [生态学（生物生态学）];

学科分类号：

071012 ; 0713 ;

摘要：

Count data are ubiquitous in ecology and the Poisson generalized linear model (GLM) is commonly used to model the association between counts and explanatory variables of interest. When fitting this model to the data, one typically proceeds by first confirming that the model assumptions are satisfied. If the residuals appear to be overdispersed or if there is zero-inflation, key assumptions of the Poison GLM may be violated and researchers will then typically consider alternatives to the Poison GLM. An important question is whether the potential model selection bias introduced by this data-driven multi-stage procedure merits concern. Here we conduct a large-scale simulation study to investigate the potential consequences of model selection bias that can arise in the simple scenario of analysing a sample of potentially overdispersed, potentially zero-inflated, count data. Specifically, we investigate model selection procedures recently recommended by Blasco-Moreno et al. (2019) using either a series of score tests or information theoretic criteria to select the best model. We find that, when sample sizes are small, model selection based on preliminary score tests (or information theoretic criteria, e.g. AIC, BIC) can lead to potentially substantial inflation of false positive rates (i.e. type 1 error inflation). When sample sizes are sufficiently large, model selection based on preliminary score tests, is not problematic. Ignoring the possibility of overdispersion and zero-inflation during data analyses can lead to invalid inference. However, if one does not have sufficient power to test for overdispersion and zero-inflation, post hoc model selection may also lead to substantial bias. This 'catch-22' suggests that, if sample sizes are small, a healthy skepticism is warranted whenever one rejects the null hypothesis of no association between a given outcome and covariate.

引用

页码：665 / 680

页数：16

共 75 条

[1]

Albers, 2019, META PSYCHOL, V3, P1592

[2] Retire statistical significance [J].