Optimal distributed subsampling under heterogeneity

被引:0
|
作者
Shao, Yujing [1 ,2 ]
Wang, Lei [1 ,2 ]
Lian, Heng [3 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, Tianjin, Peoples R China
[2] Nankai Univ, LPMC, Tianjin, Peoples R China
[3] City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
ADMM; Heterogeneity; Nonsmooth loss; Random perturbation; Site-specific nuisance parameters; REGRESSION;
D O I
10.1007/s11222-024-10558-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed subsampling approaches have been proposed to process massive data in a distributed computing environment, where subsamples are taken from each site and then analyzed collectively to address statistical problems when the full data is not available. In this paper, we consider that each site involves a common parameter and site-specific nuisance parameters and then formulate a unified framework of optimal distributed subsampling under heterogeneity for general optimization problems with convex loss functions that could be nonsmooth. By establishing the consistency and asymptotic normality of the distributed subsample estimators for the common parameter of interest, we derive the optimal subsampling probabilities and allocation sizes under the A- and L-optimality criteria. A two-step algorithm is proposed for practical implementation and the asymptotic properties of the resultant estimator are established. For nonsmooth loss functions, an alternating direction method of multipliers method and a random perturbation procedure are proposed to obtain the subsample estimator and estimate the covariance matrices for statistical inference, respectively. The finite-sample performance of linear regression, logistic regression and quantile regression models is demonstrated through simulation studies and an application to the National Longitudinal Survey of Youth Dataset is also provided.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] A theoretical model of firm heterogeneity, FDI, and optimal technology strategy
    Liu, Fang-Min
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2011, 32 (01) : 57 - 74
  • [32] Optimal deployment of a distributed IoT acoustic surveillance system
    Bin Hamza, Manal Omer
    Alghamdi, Leena Abdullah
    Watfa, Mohamed K.
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2021, 37 (01) : 1 - 13
  • [33] Distributed AC Optimal Power Flow using ALADIN
    Engelmann, Alexander
    Muhlpfordt, Tillmann
    Jiang, Yuning
    Houska, Boris
    Faulwasser, Timm
    IFAC PAPERSONLINE, 2017, 50 (01): : 5536 - 5541
  • [34] Robust Taylor rules under heterogeneity in currency trade
    Bask M.
    Selander C.
    International Economics and Economic Policy, 2009, 6 (3) : 283 - 313
  • [35] Reprint of: Robust inference on correlation under general heterogeneity
    Giraitis, Liudas
    Li, Yufei
    Phillips, Peter C. B.
    JOURNAL OF ECONOMETRICS, 2024, 244 (02)
  • [36] Concentration Heterogeneity in Structure of Intermetallic Synthesized under Compression
    O.B.Perevalova
    M.B.Fedorischeva
    V.E.Ovcharenko
    HuiheSU
    Journal of Materials Science & Technology, 2002, (05) : 432 - 435
  • [37] Distributed Heterogeneity Learning for Generalized Partially Linear Models with Spatially Varying Coefficients
    Yu, Shan
    Wang, Guannan
    Wang, Li
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [38] Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce
    Miao, Xupeng
    Nie, Xiaonan
    Shao, Yingxia
    Yang, Zhi
    Jiang, Jiawei
    Ma, Lingxiao
    Cui, Bin
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2262 - 2270
  • [39] Coordinated distributed experiments in ecology do not consistently reduce heterogeneity in effect size
    Bebout, Julia
    Fox, Jeremy W.
    OIKOS, 2024, 2024 (06)
  • [40] The heterogeneity of inflation expectation: An empirical analysis based on optimal information acquisition
    Dai W.
    Shi Z.
    Dai, Wei (dweisky@163.com), 1600, Taru Publications (20): : 555 - 579