Optimal distributed subsampling under heterogeneity

被引:0
|
作者
Shao, Yujing [1 ,2 ]
Wang, Lei [1 ,2 ]
Lian, Heng [3 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
[2] Nankai Univ, LPMC, Tianjin, Peoples R China
[3] City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
ADMM; Heterogeneity; Nonsmooth loss; Random perturbation; Site-specific nuisance parameters; REGRESSION;
D O I
10.1007/s11222-024-10558-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed subsampling approaches have been proposed to process massive data in a distributed computing environment, where subsamples are taken from each site and then analyzed collectively to address statistical problems when the full data is not available. In this paper, we consider that each site involves a common parameter and site-specific nuisance parameters and then formulate a unified framework of optimal distributed subsampling under heterogeneity for general optimization problems with convex loss functions that could be nonsmooth. By establishing the consistency and asymptotic normality of the distributed subsample estimators for the common parameter of interest, we derive the optimal subsampling probabilities and allocation sizes under the A- and L-optimality criteria. A two-step algorithm is proposed for practical implementation and the asymptotic properties of the resultant estimator are established. For nonsmooth loss functions, an alternating direction method of multipliers method and a random perturbation procedure are proposed to obtain the subsample estimator and estimate the covariance matrices for statistical inference, respectively. The finite-sample performance of linear regression, logistic regression and quantile regression models is demonstrated through simulation studies and an application to the National Longitudinal Survey of Youth Dataset is also provided.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Distributed subsampling for multiplicative regression
    Li, Xiaoyan
    Xia, Xiaochao
    Zhang, Zhimin
    STATISTICS AND COMPUTING, 2024, 34 (05)
  • [2] Optimal Decorrelated Score Subsampling for High-Dimensional Generalized Linear Models Under Measurement Constraints
    Shao, Yujing
    Wang, Lei
    Lian, Heng
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,
  • [3] Robust optimal subsampling based on weighted asymmetric least squares
    Ren, Min
    Zhao, Shengli
    Wang, Mingqiu
    Zhu, Xinbei
    STATISTICAL PAPERS, 2024, 65 (04) : 2221 - 2251
  • [4] Optimal subsampling for least absolute relative error estimators with massive data
    Ren, Min
    Zhao, Shengli
    Wang, Mingqiu
    JOURNAL OF COMPLEXITY, 2023, 74
  • [5] Optimal subsampling for the Cox proportional hazards model with massive survival data
    Qiao, Nan
    Li, Wangcheng
    Xiao, Feng
    Lin, Cunjie
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2024, 231
  • [6] On Moran's I coefficient under heterogeneity
    Zhang, Tonglin
    Lin, Ge
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 95 : 83 - 94
  • [7] Tasklets: Overcoming Heterogeneity in Distributed Computing Systems
    Schaefer, Dominik
    Edinger, Janick
    VanSyckel, Sebastian
    Becker, Christian
    Paluska, Justin Mazzola
    2016 IEEE 36TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2016), 2016, : 156 - 161
  • [8] Regularized Nystrom subsampling in regression and ranking problems under general smoothness assumptions
    Myleiko, G. L.
    Pereverzyev, S., Jr.
    Solodky, S. G.
    ANALYSIS AND APPLICATIONS, 2019, 17 (03) : 453 - 475
  • [9] The effect of heterogeneity on optimal regimens in cancer chemotherapy
    Murray, JM
    Coldman, AJ
    MATHEMATICAL BIOSCIENCES, 2003, 185 (01) : 73 - 87
  • [10] Optimal growth policy: The role of skill heterogeneity
    Grossmann, Volker
    Steger, Thomas M.
    ECONOMICS LETTERS, 2013, 119 (02) : 162 - 164