Variable Selection for Distributed Sparse Regression Under Memory Constraints

被引:0
|
作者
Wang, Haofeng [1 ,2 ]
Jiang, Xuejun [2 ]
Zhou, Min [3 ]
Jiang, Jiancheng [4 ]
机构
[1] Harbin Inst Technol, Dept Math, Harbin, Peoples R China
[2] Southern Univ Sci & Technol, Dept Stat & Data Sci, Shenzhen, Peoples R China
[3] Hong Kong Baptist Univ United Int Coll, Beijing Normal Univ, Zhuhai, Peoples R China
[4] Univ North Carolina Charlotte, Dept Math & Stat, Charlotte, NC USA
关键词
Variable selection; Distributed sparse regression; Memory constraints; Distributed penalized likelihood algorithm; NONCONCAVE PENALIZED LIKELIHOOD; QUANTILE REGRESSION; STATISTICS;
D O I
10.1007/s40304-022-00291-w
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This paper studies variable selection using the penalized likelihood method for distributed sparse regression with large sample size n under a limited memory constraint. This is a much needed research problem to be solved in the big data era. A naive divide-and-conquer method solving this problem is to split the whole data into N parts and run each part on one of N machines, aggregate the results from all machines via averaging, and finally obtain the selected variables. However, it tends to select more noise variables, and the false discovery rate may not be well controlled. We improve it by a special designed weighted average in aggregation. Although the alternating direction method of multiplier can be used to deal with massive data in the literature, our proposed method reduces the computational burden a lot and performs better by mean square error in most cases. Theoretically, we establish asymptotic properties of the resulting estimators for the likelihood models with a diverging number of parameters. Under some regularity conditions, we establish oracle properties in the sense that our distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample. Computationally, a distributed penalized likelihood algorithm is proposed to refine the results in the context of general likelihoods. Furthermore, the proposed method is evaluated by simulations and a real example.
引用
收藏
页码:307 / 338
页数:32
相关论文
共 50 条
  • [31] Nonnegative estimation and variable selection under minimax concave penalty for sparse high-dimensional linear regression models
    Li, Ning
    Yang, Hu
    STATISTICAL PAPERS, 2021, 62 (02) : 661 - 680
  • [32] Nonnegative estimation and variable selection under minimax concave penalty for sparse high-dimensional linear regression models
    Ning Li
    Hu Yang
    Statistical Papers, 2021, 62 : 661 - 680
  • [33] Input selection for disturbance rejection under manipulated variable constraints
    Cao, Y
    Rossiter, D
    Owens, D
    COMPUTERS & CHEMICAL ENGINEERING, 1997, 21 : S403 - S408
  • [34] Data compression under constraints of causality and variable finite memory
    Torokhti, A.
    Miklavcic, S. J.
    SIGNAL PROCESSING, 2010, 90 (10) : 2822 - 2834
  • [35] Bayesian variable selection approach to a Bernstein polynomial regression model with stochastic constraints
    Choi, Taeryon
    Kim, Hea-Jung
    Jo, Seongil
    JOURNAL OF APPLIED STATISTICS, 2016, 43 (15) : 2751 - 2771
  • [36] An adaptive sparse distributed memory
    Aguilar, JL
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2197 - 2201
  • [37] Convergence in a sparse distributed memory
    Sjödin, G
    VTH BRAZILIAN SYMPOSIUM ON NEURAL NETWORKS, PROCEEDINGS, 1998, : 165 - 168
  • [38] Extended Sparse Distributed Memory
    Snaider, Javier
    Franklin, Stan
    BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES 2011, 2011, 233 : 351 - +
  • [39] Variable Screening for Sparse Online Regression
    Liang, Jingwei
    Poon, Clarice
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (01) : 275 - 293
  • [40] Sparse Regression in Cancer Genomics: Comparing Variable Selection and Predictions in Real World Data
    O'Shea, Robert J.
    Tsoka, Sophia
    Cook, Gary J. R.
    Goh, Vicky
    CANCER INFORMATICS, 2021, 20