Variable Selection for Distributed Sparse Regression Under Memory Constraints

被引:0
|
作者
Wang, Haofeng [1 ,2 ]
Jiang, Xuejun [2 ]
Zhou, Min [3 ]
Jiang, Jiancheng [4 ]
机构
[1] Harbin Inst Technol, Dept Math, Harbin, Peoples R China
[2] Southern Univ Sci & Technol, Dept Stat & Data Sci, Shenzhen, Peoples R China
[3] Hong Kong Baptist Univ United Int Coll, Beijing Normal Univ, Zhuhai, Peoples R China
[4] Univ North Carolina Charlotte, Dept Math & Stat, Charlotte, NC USA
关键词
Variable selection; Distributed sparse regression; Memory constraints; Distributed penalized likelihood algorithm; NONCONCAVE PENALIZED LIKELIHOOD; QUANTILE REGRESSION; STATISTICS;
D O I
10.1007/s40304-022-00291-w
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This paper studies variable selection using the penalized likelihood method for distributed sparse regression with large sample size n under a limited memory constraint. This is a much needed research problem to be solved in the big data era. A naive divide-and-conquer method solving this problem is to split the whole data into N parts and run each part on one of N machines, aggregate the results from all machines via averaging, and finally obtain the selected variables. However, it tends to select more noise variables, and the false discovery rate may not be well controlled. We improve it by a special designed weighted average in aggregation. Although the alternating direction method of multiplier can be used to deal with massive data in the literature, our proposed method reduces the computational burden a lot and performs better by mean square error in most cases. Theoretically, we establish asymptotic properties of the resulting estimators for the likelihood models with a diverging number of parameters. Under some regularity conditions, we establish oracle properties in the sense that our distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample. Computationally, a distributed penalized likelihood algorithm is proposed to refine the results in the context of general likelihoods. Furthermore, the proposed method is evaluated by simulations and a real example.
引用
收藏
页码:307 / 338
页数:32
相关论文
共 50 条
  • [41] Simultaneous variable selection and parametric estimation for quantile regression
    Wei Xiong
    Maozai Tian
    Journal of the Korean Statistical Society, 2015, 44 : 134 - 149
  • [42] Variable selection of the quantile varying coefficient regression models
    Weihua Zhao
    Riquan Zhang
    Yazhao Lv
    Jicai Liu
    Journal of the Korean Statistical Society, 2013, 42 : 343 - 358
  • [43] A Bayesian variable selection approach to longitudinal quantile regression
    Kedia, Priya
    Kundu, Damitri
    Das, Kiranmoy
    STATISTICAL METHODS AND APPLICATIONS, 2023, 32 (01): : 149 - 168
  • [44] Robust variable selection for finite mixture regression models
    Tang, Qingguo
    Karunamuni, R. J.
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2018, 70 (03) : 489 - 521
  • [45] Variable selection in rank regression for analyzing longitudinal data
    Fu, Liya
    Wang, You-Gan
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (08) : 2447 - 2458
  • [46] Variable selection for varying dispersion beta regression model
    Zhao, Weihua
    Zhang, Riquan
    Lv, Yazhao
    Liu, Jicai
    JOURNAL OF APPLIED STATISTICS, 2014, 41 (01) : 95 - 108
  • [47] Estimation and variable selection for partial functional linear regression
    Tang, Qingguo
    Jin, Peng
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2019, 103 (04) : 475 - 501
  • [48] Variable selection in partial linear regression with functional covariate
    Aneiros, G.
    Ferraty, F.
    Vieu, P.
    STATISTICS, 2015, 49 (06) : 1322 - 1347
  • [49] A Bayesian variable selection approach to longitudinal quantile regression
    Priya Kedia
    Damitri Kundu
    Kiranmoy Das
    Statistical Methods & Applications, 2023, 32 : 149 - 168
  • [50] Binary quantile regression and variable selection: A new approach
    Aristodemou, Katerina
    He, Jian
    Yu, Keming
    ECONOMETRIC REVIEWS, 2019, 38 (06) : 679 - 694