Variable Selection for Distributed Sparse Regression Under Memory Constraints

被引:0
|
作者
Wang, Haofeng [1 ,2 ]
Jiang, Xuejun [2 ]
Zhou, Min [3 ]
Jiang, Jiancheng [4 ]
机构
[1] Harbin Inst Technol, Dept Math, Harbin, Peoples R China
[2] Southern Univ Sci & Technol, Dept Stat & Data Sci, Shenzhen, Peoples R China
[3] Hong Kong Baptist Univ United Int Coll, Beijing Normal Univ, Zhuhai, Peoples R China
[4] Univ North Carolina Charlotte, Dept Math & Stat, Charlotte, NC USA
关键词
Variable selection; Distributed sparse regression; Memory constraints; Distributed penalized likelihood algorithm; NONCONCAVE PENALIZED LIKELIHOOD; QUANTILE REGRESSION; STATISTICS;
D O I
10.1007/s40304-022-00291-w
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This paper studies variable selection using the penalized likelihood method for distributed sparse regression with large sample size n under a limited memory constraint. This is a much needed research problem to be solved in the big data era. A naive divide-and-conquer method solving this problem is to split the whole data into N parts and run each part on one of N machines, aggregate the results from all machines via averaging, and finally obtain the selected variables. However, it tends to select more noise variables, and the false discovery rate may not be well controlled. We improve it by a special designed weighted average in aggregation. Although the alternating direction method of multiplier can be used to deal with massive data in the literature, our proposed method reduces the computational burden a lot and performs better by mean square error in most cases. Theoretically, we establish asymptotic properties of the resulting estimators for the likelihood models with a diverging number of parameters. Under some regularity conditions, we establish oracle properties in the sense that our distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample. Computationally, a distributed penalized likelihood algorithm is proposed to refine the results in the context of general likelihoods. Furthermore, the proposed method is evaluated by simulations and a real example.
引用
收藏
页码:307 / 338
页数:32
相关论文
共 50 条
  • [1] Variable selection for sparse logistic regression
    Zanhua Yin
    Metrika, 2020, 83 : 821 - 836
  • [2] Variable selection for sparse logistic regression
    Yin, Zanhua
    METRIKA, 2020, 83 (07) : 821 - 836
  • [3] VARIABLE SELECTION IN SPARSE REGRESSION WITH QUADRATIC MEASUREMENTS
    Fan, Jun
    Kong, Lingchen
    Wang, Liqun
    Xiu, Naihua
    STATISTICA SINICA, 2018, 28 (03) : 1157 - 1178
  • [4] Adaptive Variable Selection in Nonparametric Sparse Regression
    Ingster Y.
    Stepanova N.
    Journal of Mathematical Sciences, 2014, 199 (2) : 184 - 201
  • [5] Sparse neural network regression with variable selection
    Shin, Jae-Kyung
    Bak, Kwan-Young
    Koo, Ja-Yong
    COMPUTATIONAL INTELLIGENCE, 2022, 38 (06) : 2075 - 2094
  • [6] Quantile function regression and variable selection for sparse models
    Yoshida, Takuma
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2021, 49 (04): : 1196 - 1221
  • [7] Variable Selection for Sparse Logistic Regression with Grouped Variables
    Zhong, Mingrui
    Yin, Zanhua
    Wang, Zhichao
    MATHEMATICS, 2023, 11 (24)
  • [8] Exhaustive Search for Sparse Variable Selection in Linear Regression
    Igarashi, Yasuhiko
    Takenaka, Hikaru
    Nakanishi-Ohno, Yoshinori
    Uemura, Makoto
    Ikeda, Shiro
    Okada, Masato
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2018, 87 (04)
  • [9] Finite mixture regression: A sparse variable selection by model selection for clustering
    Devijver, Emilie
    ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (02): : 2642 - 2674
  • [10] Distributed Nonparametric Regression under Communication Constraints
    Zhu, Yuancheng
    Lafferty, John
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80