Variable Selection for Distributed Sparse Regression Under Memory Constraints

被引：0

作者：

Wang, Haofeng ^{[1
,2
]}

Jiang, Xuejun ^{[2
]}

Zhou, Min ^{[3
]}

Jiang, Jiancheng ^{[4
]}

机构：

[1] Harbin Inst Technol, Dept Math, Harbin, Peoples R China

[2] Southern Univ Sci & Technol, Dept Stat & Data Sci, Shenzhen, Peoples R China

[3] Hong Kong Baptist Univ United Int Coll, Beijing Normal Univ, Zhuhai, Peoples R China

[4] Univ North Carolina Charlotte, Dept Math & Stat, Charlotte, NC USA

来源：

COMMUNICATIONS IN MATHEMATICS AND STATISTICS | 2024年 / 12卷 / 02期

关键词：

Variable selection; Distributed sparse regression; Memory constraints; Distributed penalized likelihood algorithm; NONCONCAVE PENALIZED LIKELIHOOD; QUANTILE REGRESSION; STATISTICS;

D O I：

10.1007/s40304-022-00291-w

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

This paper studies variable selection using the penalized likelihood method for distributed sparse regression with large sample size n under a limited memory constraint. This is a much needed research problem to be solved in the big data era. A naive divide-and-conquer method solving this problem is to split the whole data into N parts and run each part on one of N machines, aggregate the results from all machines via averaging, and finally obtain the selected variables. However, it tends to select more noise variables, and the false discovery rate may not be well controlled. We improve it by a special designed weighted average in aggregation. Although the alternating direction method of multiplier can be used to deal with massive data in the literature, our proposed method reduces the computational burden a lot and performs better by mean square error in most cases. Theoretically, we establish asymptotic properties of the resulting estimators for the likelihood models with a diverging number of parameters. Under some regularity conditions, we establish oracle properties in the sense that our distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample. Computationally, a distributed penalized likelihood algorithm is proposed to refine the results in the context of general likelihoods. Furthermore, the proposed method is evaluated by simulations and a real example.

引用

页码：307 / 338

页数：32

共 50 条

[31] Nonnegative estimation and variable selection under minimax concave penalty for sparse high-dimensional linear regression models
Li, Ning
Yang, Hu
STATISTICAL PAPERS, 2021, 62 (02) : 661 - 680
[32] Nonnegative estimation and variable selection under minimax concave penalty for sparse high-dimensional linear regression models
Ning Li
Hu Yang
Statistical Papers, 2021, 62 : 661 - 680
[33] Input selection for disturbance rejection under manipulated variable constraints
Cao, Y
Rossiter, D
Owens, D
COMPUTERS & CHEMICAL ENGINEERING, 1997, 21 : S403 - S408
[34] Data compression under constraints of causality and variable finite memory
Torokhti, A.
Miklavcic, S. J.
SIGNAL PROCESSING, 2010, 90 (10) : 2822 - 2834
[35] Bayesian variable selection approach to a Bernstein polynomial regression model with stochastic constraints
Choi, Taeryon
Kim, Hea-Jung
Jo, Seongil
JOURNAL OF APPLIED STATISTICS, 2016, 43 (15) : 2751 - 2771
[36] An adaptive sparse distributed memory
Aguilar, JL
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2197 - 2201
[37] Convergence in a sparse distributed memory
Sjödin, G
VTH BRAZILIAN SYMPOSIUM ON NEURAL NETWORKS, PROCEEDINGS, 1998, : 165 - 168
[38] Extended Sparse Distributed Memory
Snaider, Javier
Franklin, Stan
BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES 2011, 2011, 233 : 351 - +
[39] Variable Screening for Sparse Online Regression
Liang, Jingwei
Poon, Clarice
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (01) : 275 - 293
[40] Sparse Regression in Cancer Genomics: Comparing Variable Selection and Predictions in Real World Data
O'Shea, Robert J.
Tsoka, Sophia
Cook, Gary J. R.
Goh, Vicky
CANCER INFORMATICS, 2021, 20

← 1 2 3 4 5 →