Optimal distributed subsampling under heterogeneity
被引:0
|
作者:
Shao, Yujing
论文数: 0引用数: 0
h-index: 0
机构:
Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Nankai Univ, LPMC, Tianjin, Peoples R ChinaNankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Shao, Yujing
[1
,2
]
Wang, Lei
论文数: 0引用数: 0
h-index: 0
机构:
Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Nankai Univ, LPMC, Tianjin, Peoples R ChinaNankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Wang, Lei
[1
,2
]
Lian, Heng
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R ChinaNankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Lian, Heng
[3
]
机构:
[1] Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
[2] Nankai Univ, LPMC, Tianjin, Peoples R China
[3] City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R China
ADMM;
Heterogeneity;
Nonsmooth loss;
Random perturbation;
Site-specific nuisance parameters;
REGRESSION;
D O I:
10.1007/s11222-024-10558-7
中图分类号:
TP301 [理论、方法];
学科分类号:
081202 ;
摘要:
Distributed subsampling approaches have been proposed to process massive data in a distributed computing environment, where subsamples are taken from each site and then analyzed collectively to address statistical problems when the full data is not available. In this paper, we consider that each site involves a common parameter and site-specific nuisance parameters and then formulate a unified framework of optimal distributed subsampling under heterogeneity for general optimization problems with convex loss functions that could be nonsmooth. By establishing the consistency and asymptotic normality of the distributed subsample estimators for the common parameter of interest, we derive the optimal subsampling probabilities and allocation sizes under the A- and L-optimality criteria. A two-step algorithm is proposed for practical implementation and the asymptotic properties of the resultant estimator are established. For nonsmooth loss functions, an alternating direction method of multipliers method and a random perturbation procedure are proposed to obtain the subsample estimator and estimate the covariance matrices for statistical inference, respectively. The finite-sample performance of linear regression, logistic regression and quantile regression models is demonstrated through simulation studies and an application to the National Longitudinal Survey of Youth Dataset is also provided.
机构:
Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Nankai Univ, LPMC, Tianjin, Peoples R ChinaNankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Shao, Yujing
Wang, Lei
论文数: 0引用数: 0
h-index: 0
机构:
Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Nankai Univ, LPMC, Tianjin, Peoples R ChinaNankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
Wang, Lei
Lian, Heng
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R ChinaNankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
机构:
Qufu Normal Univ, Sch Stat & Data Sci, Qufu 273165, Shandong, Peoples R ChinaQufu Normal Univ, Sch Stat & Data Sci, Qufu 273165, Shandong, Peoples R China
Ren, Min
Zhao, Shengli
论文数: 0引用数: 0
h-index: 0
机构:
Qufu Normal Univ, Sch Stat & Data Sci, Qufu 273165, Shandong, Peoples R ChinaQufu Normal Univ, Sch Stat & Data Sci, Qufu 273165, Shandong, Peoples R China
Zhao, Shengli
Wang, Mingqiu
论文数: 0引用数: 0
h-index: 0
机构:
Qufu Normal Univ, Sch Stat & Data Sci, Qufu 273165, Shandong, Peoples R ChinaQufu Normal Univ, Sch Stat & Data Sci, Qufu 273165, Shandong, Peoples R China
Wang, Mingqiu
Zhu, Xinbei
论文数: 0引用数: 0
h-index: 0
机构:
Virginia Tech Univ, Dept Comp Sci, Blacksburg, VA 24061 USAQufu Normal Univ, Sch Stat & Data Sci, Qufu 273165, Shandong, Peoples R China