DISTRIBUTED STATISTICAL INFERENCE FOR MASSIVE DATA

被引:27
作者
Chen, Song Xi [1 ,2 ]
Peng, Liuhua [3 ]
机构
[1] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
[2] Peking Univ, Ctr Stat Sci, Beijing, Peoples R China
[3] Univ Melbourne, Sch Math & Stat, Melbourne, Vic, Australia
基金
中国国家自然科学基金;
关键词
Distributed bootstrap; distributed statistics; massive data; pseudo-distributed boot-strap; EDGEWORTH EXPANSIONS; BOOTSTRAP METHODS; REGRESSION;
D O I
10.1214/21-AOS2062
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper considers distributed statistical inference for general symmetric statistics in the context of massive data with efficient computation. Estimation efficiency and asymptotic distributions of the distributed statistics are provided, which reveal different results between the nondegenerate and degenerate cases, and show the number of the data subsets plays an important role. Two distributed bootstrap methods are proposed and analyzed to approximation the underlying distribution of the distributed statistics with improved computation efficiency over existing methods. The accuracy of the distributional approximation by the bootstrap are studied theoretically. One of the methods, the pseudo-distributed bootstrap, is particularly attractive if the number of datasets is large as it directly resamples the subset-based statistics, assumes less stringent conditions and its performance can be improved by studentization.
引用
收藏
页码:2851 / 2869
页数:19
相关论文
共 26 条
[1]   ON THE BOOTSTRAP OF U-STATISTICS AND V-STATISTICS [J].
ARCONES, MA ;
GINE, E .
ANNALS OF STATISTICS, 1992, 20 (02) :655-674
[2]   Distributed inference for degenerate U-statistics [J].
Atta-Asiamah, Ernest ;
Yuan, Mingao .
STAT, 2019, 8 (01)
[3]   DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS [J].
Battey, Heather ;
Fan, Jianqing ;
Liu, Han ;
Lu, Junwei ;
Zhu, Ziwei .
ANNALS OF STATISTICS, 2018, 46 (03) :1352-1382
[4]   VALIDITY OF FORMAL EDGEWORTH EXPANSION [J].
BHATTACHARYA, RN ;
GHOSH, JK .
ANNALS OF STATISTICS, 1978, 6 (02) :434-451
[5]   Double-bootstrap methods that use a single double-bootstrap simulation [J].
Chang, Jinyuan ;
Hall, Peter .
BIOMETRIKA, 2015, 102 (01) :203-214
[6]  
CHEN S. X, 2021, DISTRIBUTED STAT INF, DOI [10.1214/21-AOS2062SUPP, DOI 10.1214/21-AOS2062SUPP]
[7]   SMOOTHED EMPIRICAL LIKELIHOOD CONFIDENCE-INTERVALS FOR QUANTILES [J].
CHEN, SX ;
HALL, P .
ANNALS OF STATISTICS, 1993, 21 (03) :1166-1181
[8]   A SPLIT-AND-CONQUER APPROACH FOR ANALYSIS OF EXTRAORDINARILY LARGE DATA [J].
Chen, Xueying ;
Xie, Min-ge .
STATISTICA SINICA, 2014, 24 (04) :1655-1684
[9]  
Davidson R., 2002, Econometric Reviews, V21, P419, DOI DOI 10.1081/ETC-120015384
[10]   1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].
EFRON, B .
ANNALS OF STATISTICS, 1979, 7 (01) :1-26