Optimal distributed subsampling under heterogeneity

被引:0
|
作者
Shao, Yujing [1 ,2 ]
Wang, Lei [1 ,2 ]
Lian, Heng [3 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
[2] Nankai Univ, LPMC, Tianjin, Peoples R China
[3] City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
ADMM; Heterogeneity; Nonsmooth loss; Random perturbation; Site-specific nuisance parameters; REGRESSION;
D O I
10.1007/s11222-024-10558-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed subsampling approaches have been proposed to process massive data in a distributed computing environment, where subsamples are taken from each site and then analyzed collectively to address statistical problems when the full data is not available. In this paper, we consider that each site involves a common parameter and site-specific nuisance parameters and then formulate a unified framework of optimal distributed subsampling under heterogeneity for general optimization problems with convex loss functions that could be nonsmooth. By establishing the consistency and asymptotic normality of the distributed subsample estimators for the common parameter of interest, we derive the optimal subsampling probabilities and allocation sizes under the A- and L-optimality criteria. A two-step algorithm is proposed for practical implementation and the asymptotic properties of the resultant estimator are established. For nonsmooth loss functions, an alternating direction method of multipliers method and a random perturbation procedure are proposed to obtain the subsample estimator and estimate the covariance matrices for statistical inference, respectively. The finite-sample performance of linear regression, logistic regression and quantile regression models is demonstrated through simulation studies and an application to the National Longitudinal Survey of Youth Dataset is also provided.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] LIC: An R package for optimal subset selection for distributed data
    Chang, Di
    Guo, Guangbao
    SOFTWAREX, 2024, 28
  • [42] Distributed Optimal Power Flow using Feasible Point Pursuit
    Zamzam, Ahmed S.
    Fu, Xiao
    Dall'Anese, Emiliano
    Sidiropoulos, Nicholas D.
    2017 IEEE 7TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP), 2017,
  • [43] Nonparametric distributed learning under general designs
    Liu, Meimei
    Shang, Zuofeng
    Cheng, Guang
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (02): : 3070 - 3102
  • [44] On the Convergence of Distributed Subgradient Methods under Quantization
    Doan, Thinh T.
    Maguluri, Siva Theja
    Romberg, Justin
    2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 567 - 574
  • [45] Practical Considerations of DER Coordination with Distributed Optimal Power Flow
    Gebbran, Daniel
    Mhanna, Sleiman
    Chapman, Archie C.
    Hardjawana, Wibowo
    Vucetic, Branka
    Verbic, Gregor
    2020 INTERNATIONAL CONFERENCE ON SMART GRIDS AND ENERGY SYSTEMS (SGES 2020), 2020, : 209 - 214
  • [46] Properties of the QME under asymmetrically distributed disturbances
    Laitila, T
    STATISTICS & PROBABILITY LETTERS, 2001, 52 (04) : 347 - 352
  • [47] Optimal harvesting policy for biological resources with uncertain heterogeneity for application in fisheries management
    Yoshioka, Hidekazu
    NATURAL RESOURCE MODELING, 2024, 37 (02)
  • [48] A MOMENT-BASED TEST OF GENETIC LINKAGE UNDER HETEROGENEITY
    Ning, Wei
    JP JOURNAL OF BIOSTATISTICS, 2007, 1 (03) : 267 - 281
  • [49] Formation of local heterogeneity under energy collection in neural networks
    XIE Ying
    YAO Zhao
    MA Jun
    Science China(Technological Sciences), 2023, 66 (02) : 439 - 455
  • [50] Formation of local heterogeneity under energy collection in neural networks
    Ying Xie
    Zhao Yao
    Jun Ma
    Science China Technological Sciences, 2023, 66 : 439 - 455