Distributed hypothesis testing for large dimensional two-sample mean vectors

被引:0
作者
Yan, Lu [1 ,2 ]
Hu, Jiang [1 ,2 ]
Wu, Lixiu [1 ,2 ]
机构
[1] Northeast Normal Univ, KLASMOE, Renmin St, Changchun 130024, Jilin, Peoples R China
[2] Northeast Normal Univ, Sch Math & Stat, Renmin St, Changchun 130024, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed algorithm; Sample covariance matrices; Hypothesis testing; Asymptotic normality;
D O I
10.1007/s11222-024-10489-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The advent of the big data era has brought massive datasets to the forefront of academic and industrial discussions. Due to the high communication cost and long calculation time, traditional statistical methods may be difficult to process data centrally on a single server. A robust distributed system can effectively mitigate communication costs and enhance computational efficiency. However, the classical two-sample hypothesis testing problem in statistical analysis has not yet been fully developed within a distributed system framework. This paper explores the challenges of performing two-sample mean tests in a distributed framework, especially in the presence of unequal covariance matrices. By distributing samples across various nodes, we introduce two distributed test statistics: the blockwise linear two-sample test and the distributed two-sample test. Even though the sample size of each node is less than the dimension, the proposed test statistics maintain robust statistical properties. Both statistics are designed to enhance communication efficiency and reduce communication costs compared to the full-sample statistic. Simulation experiments and empirical analyses further confirm the favorable statistical properties of the proposed test statistics.
引用
收藏
页数:31
相关论文
共 32 条
[1]   Distributed computing with the cloud [J].
Afek, Yehuda ;
Giladi, Gal ;
Patt-Shamir, Boaz .
DISTRIBUTED COMPUTING, 2024, 37 (01) :1-18
[2]  
Bai ZD, 1996, STAT SINICA, V6, P311
[3]  
Bayle P, 2024, Arxiv, DOI arXiv:2302.12111
[4]  
Bolón-Canedo V, 2017, IEEE IJCNN, P1665, DOI 10.1109/IJCNN.2017.7966051
[5]   DISTRIBUTED STATISTICAL INFERENCE FOR MASSIVE DATA [J].
Chen, Song Xi ;
Peng, Liuhua .
ANNALS OF STATISTICS, 2021, 49 (05) :2851-2869
[6]   A TWO-SAMPLE TEST FOR HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING [J].
Chen, Song Xi ;
Qin, Ying-Li .
ANNALS OF STATISTICS, 2010, 38 (02) :808-835
[7]   Testing Models of Consumer Search Using Data on Web Browsing and Purchasing Behavior [J].
De los Santos, Babur ;
Hortacsu, Ali ;
Wildenbeest, Matthijs R. .
AMERICAN ECONOMIC REVIEW, 2012, 102 (06) :2955-2980
[8]   Communication-Efficient Accurate Statistical Estimation [J].
Fan, Jianqing ;
Guo, Yongyi ;
Wang, Kaizheng .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (542) :1000-1010
[9]   A Two-Sample Test for Equality of Means in High Dimension [J].
Gregory, Karl Bruce ;
Carroll, Raymond J. ;
Baladandayuthapani, Veerabhadran ;
Lahiri, Soumendra N. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (510) :837-849
[10]  
Guestrin C, 2004, IPSN '04: THIRD INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING IN SENSOR NETWORKS, P1