Robust distributed modal regression for massive data

被引:33
|
作者
Wang, Kangning [1 ]
Li, Shaomin [2 ,3 ]
机构
[1] Shandong Technol & Business Univ, Sch Stat, Yantai, Peoples R China
[2] Beijing Normal Univ, Ctr Stat & Data Sci, Zhuhai, Peoples R China
[3] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
基金
中国博士后科学基金;
关键词
Massive data; Robustness; Communication-efficient; Modal regression; Variable selection; VARIABLE SELECTION; LIKELIHOOD; LASSO;
D O I
10.1016/j.csda.2021.107225
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Modal regression is a good alternative of the mean regression and likelihood based methods, because of its robustness and high efficiency. A robust communication-efficient distributed modal regression for the distributed massive data is proposed in this paper. Specifically, the global modal regression objective function is approximated by a surrogate one at the first machine, which relates to the local datasets only through gradients. Then the resulting estimator can be obtained at the first machine and other machines only need to calculate the gradients, which can significantly reduce the communication cost. Under mild conditions, the asymptotical properties are established, which show that the proposed estimator is statistically as efficient as the global modal regression estimator. What is more, as a specific application, a penalized robust communication-efficient distributed modal regression variable selection procedure is developed. Simulation results and real data analysis are also included to validate our method. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Distributed Regression Analysis in a Distributed Health Data Network
    Malenfant, Jessica M.
    Her, Qoua L.
    Malek, Sarah
    Vilk, Yury
    Toh, Sengwee
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2017, 26 : 526 - 526
  • [42] Deterministic subsampling for logistic regression with massive data
    Song, Yan
    Dai, Wenlin
    COMPUTATIONAL STATISTICS, 2024, 39 (02) : 709 - 732
  • [43] Deterministic subsampling for logistic regression with massive data
    Yan Song
    Wenlin Dai
    Computational Statistics, 2024, 39 : 709 - 732
  • [44] Optimal subsampling for multiplicative regression with massive data
    Wang, Tianzhen
    Zhang, Haixiang
    STATISTICA NEERLANDICA, 2022, 76 (04) : 418 - 449
  • [45] Linear expectile regression under massive data
    Song, Shanshan
    Lin, Yuanyuan
    Zhou, Yong
    FUNDAMENTAL RESEARCH, 2021, 1 (05): : 574 - 585
  • [46] Segmented regression estimators for massive data sets
    Natarajan, R
    Pednault, E
    PROCEEDINGS OF THE SECOND SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2002, : 566 - 582
  • [47] ROBUST REGRESSION OF ENZYME KINETIC DATA
    CORNISHBOWDEN, A
    ENDRENYI, L
    BIOCHEMICAL JOURNAL, 1986, 234 (01) : 21 - 29
  • [48] Robust regression for data with multiple structures
    Chen, HF
    Meer, P
    Tyler, DE
    2001 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2001, : 1069 - 1075
  • [49] ROBUST REGRESSION WITH CENSORED-DATA
    BASAK, I
    NAVAL RESEARCH LOGISTICS, 1992, 39 (03) : 323 - 344
  • [50] Distributed regression for heterogeneous data sets
    Xing, Y
    Madden, MG
    Duggan, J
    Lyons, GJ
    ADVANCES IN INTELLIGENT DATA ANALYSIS V, 2003, 2810 : 544 - 553