Communication-efficient distributed statistical inference on zero-inflated Poisson models

被引:2
作者
Wan, Ran [1 ]
Bai, Yang [1 ]
机构
[1] Shanghai Univ Finance & Econ, Sch Stat & Management, 777 Guoding Rd, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Zero-inflated count; distributed EM algorithm; communication-efficient; MAXIMUM-LIKELIHOOD; REGRESSION; ALGORITHMS;
D O I
10.1080/24754269.2023.2263721
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Zero-inflated count outcomes are common in many studies, such as counting claim frequency in the insurance industry in which identifying and understanding excessive zeros are of interest. Moreover, with the progress of data collecting and storage techniques, the amount of data is too massive to be stored or processed by a single node or branch. Hence, to develop distributed data analysis is blossoming. In this paper, several communication-efficient distributed zero-inflated Poisson regression algorithms are developed to analyse such kind of large-scale zero-inflated data. Both asymptotic properties of the proposed estimators and algorithm complexities are well studied and conducted. Various simulation studies demonstrate that our proposed method and algorithm work well and efficiently. Finally, in the case study, we apply our proposed algorithms to a car insurance data from Kaggle.
引用
收藏
页码:81 / 106
页数:26
相关论文
共 17 条
[1]  
Cohen A.C., 1963, P INT S DISCRETE DIS, P373
[2]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[3]   Distributed EM algorithm for Gaussian mixtures in sensor networks [J].
Gu, Dongbing .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (07) :1154-1166
[4]   Zero-inflated Poisson and binomial regression with random effects: A case study [J].
Hall, DB .
BIOMETRICS, 2000, 56 (04) :1030-1039
[5]  
Johnson NL., 1969, J Roy Stat Soc, V133, P482, DOI [10.2307/2343567, DOI 10.2307/2343567]
[6]   Communication-Efficient Distributed Statistical Inference [J].
Jordan, Michael I. ;
Lee, Jason D. ;
Yang, Yun .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (526) :668-681
[7]   ZERO-INFLATED POISSON REGRESSION, WITH AN APPLICATION TO DEFECTS IN MANUFACTURING [J].
LAMBERT, D .
TECHNOMETRICS, 1992, 34 (01) :1-14
[8]   Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros [J].
Lee, AH ;
Wang, K ;
Scott, JA ;
Yau, KKW ;
McLachlan, GJ .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2006, 15 (01) :47-61
[9]   D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization [J].
Mota, Joao F. C. ;
Xavier, Joao M. F. ;
Aguiar, Pedro M. Q. ;
Pueschel, Markus .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (10) :2718-2723
[10]   Distributed EM algorithms for density estimation and clustering in sensor networks [J].
Nowak, RD .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2003, 51 (08) :2245-2253