Distributed hypothesis testing for large dimensional two-sample mean vectors

被引：0

作者：

Yan, Lu ^{[1
,2
]}

Hu, Jiang ^{[1
,2
]}

Wu, Lixiu ^{[1
,2
]}

机构：

[1] Northeast Normal Univ, KLASMOE, Renmin St, Changchun 130024, Jilin, Peoples R China

[2] Northeast Normal Univ, Sch Math & Stat, Renmin St, Changchun 130024, Jilin, Peoples R China

来源：

STATISTICS AND COMPUTING | 2024年 / 34卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Distributed algorithm; Sample covariance matrices; Hypothesis testing; Asymptotic normality;

D O I：

10.1007/s11222-024-10489-3

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The advent of the big data era has brought massive datasets to the forefront of academic and industrial discussions. Due to the high communication cost and long calculation time, traditional statistical methods may be difficult to process data centrally on a single server. A robust distributed system can effectively mitigate communication costs and enhance computational efficiency. However, the classical two-sample hypothesis testing problem in statistical analysis has not yet been fully developed within a distributed system framework. This paper explores the challenges of performing two-sample mean tests in a distributed framework, especially in the presence of unequal covariance matrices. By distributing samples across various nodes, we introduce two distributed test statistics: the blockwise linear two-sample test and the distributed two-sample test. Even though the sample size of each node is less than the dimension, the proposed test statistics maintain robust statistical properties. Both statistics are designed to enhance communication efficiency and reduce communication costs compared to the full-sample statistic. Simulation experiments and empirical analyses further confirm the favorable statistical properties of the proposed test statistics.

引用

页数：31

共 32 条

[1] Distributed computing with the cloud [J].