Extended Poisson-Tweedie: Properties and regression models for count data

被引:62
作者
Bonat, Wagner H. [1 ,2 ]
Jorgensen, Bent [2 ]
Kokonendji, Celestin C. [3 ]
Hinde, John [4 ]
Demetrio, Clarice G. B. [5 ]
机构
[1] Univ Fed Parana, Dept Stat, Lab Stat & Geoinformat, Curitiba, Parana, Brazil
[2] Univ Southern Denmark, Dept Math & Comp Sci, Odense, Denmark
[3] Bourgogne Franche Comte Univ, Lab Math Besancon, Besancon, France
[4] Natl Univ Ireland Galway, Sch Math Stat & Appl Math, Galway, Ireland
[5] Univ Sao Paulo, Dept Ciencias Exatas, Escola Super Agr Luiz de Queiroz, Piracicaba, Brazil
关键词
count data; Estimating functions; overdispersion; underdispersion; Poisson-Tweedie distribution; LONGITUDINAL DATA;
D O I
10.1177/1471082X17715718
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form mu + phi mu(p), where mu is the mean and phi and p are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for the estimation of the regression and dispersion parameters, respectively. This provides a flexible and efficient regression methodology for a comprehensive family of count models including Hermite, Neyman Type A, Polya-Aeppli, negative binomial and Poisson-inverse Gaussian. The estimating function approach allows us to extend the Poisson-Tweedie distributions to deal with underdispersed count data by allowing negative values for the dispersion parameter phi. Furthermore, the Poisson-Tweedie family can automatically adapt to highly skewed count data with excessive zeros, without the need to introduce zero-inflated or hurdle components, by the simple estimation of the power parameter. Thus, the proposed models offer a unified framework to deal with under-, equi-, overdispersed, zero-inflated and heavy-tailed count data. The computational implementation of the proposed models is fast, relying only on a simple Newton scoring algorithm. Simulation studies showed that the estimating function approach provides unbiased and consistent estimators for both regression and dispersion parameters. We highlight the ability of the Poisson-Tweedie distributions to deal with count data through a consideration of dispersion, zero-inflated and heavy tail indices, and illustrate its application with four data analyses. We provide an R implementation and the datasets as supplementary materials.
引用
收藏
页码:24 / 49
页数:26
相关论文
共 35 条
[1]  
[Anonymous], 1998, P 19 INT BIOM C CAP
[2]  
[Anonymous], R PACKAGE VERSION
[3]  
Barabesi L, 2016, 160502326 ARXIV
[4]  
Bonat, 2016, MCGLM MULTIVARIATE C
[5]   Flexible Tweedie regression models for continuous data [J].
Bonat, Wagner Hugo ;
Kokonendji, Celestin C. .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (11) :2138-2152
[6]  
BONAT WH, 2016, JOURNAL OF THE ROYAL, V65, P649, DOI DOI 10.1111/RSSC.12145
[7]   Modelling species abundance using the Poisson-Tweedie family [J].
El-Shaarawi, Abdel H. ;
Zhu, Rong ;
Joe, Harry .
ENVIRONMETRICS, 2011, 22 (02) :152-164
[8]   A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments [J].
Esnaola, Mikel ;
Puig, Pedro ;
Gonzalez, David ;
Castelo, Robert ;
Gonzalez, Juan R. .
BMC BIOINFORMATICS, 2013, 14
[9]   SOME ASPECTS OF THEORY OF ESTIMATING EQUATIONS [J].
GODAMBE, VP ;
THOMPSON, ME .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1978, 2 (01) :95-104
[10]   Chromosome aberration analysis and the influence of mitotic delay after simulated partial-body exposure with high doses of sparsely and densely ionising radiation [J].
Heimers, A ;
Brede, HJ ;
Giesen, U ;
Hoffmann, W .
RADIATION AND ENVIRONMENTAL BIOPHYSICS, 2006, 45 (01) :45-54