Communication-efficient estimation for distributed subset selection

被引:1
作者
Chen, Yan [1 ]
Dong, Ruipeng [2 ]
Wen, Canhong [2 ]
机构
[1] Univ Sci & Technol China, Sch Math Sci, Hefei 230026, Anhui, Peoples R China
[2] Univ Sci & Technol China, Int Inst Finance, Sch Management, Hefei 230026, Anhui, Peoples R China
基金
美国国家科学基金会; 中国博士后科学基金;
关键词
Variable selection; Distributed computation; Discrete optimization; GIC; VARIABLE SELECTION; REGULARIZATION; REGRESSION; SHRINKAGE;
D O I
10.1007/s11222-023-10302-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to the large scale both of the sample size and dimensions, modern data is usually stored in a distributed system, which poses unprecedented challenges in computation and statistical inference. Best subset selection is widely known as a benchmark method for handling high-dimensional data. However, there still is a lack of the study of the efficient algorithm for the best subset selection in the distributed system. To this end, we propose a new communication-efficient method to deal with the best subset selection in the distributed system. The proposed method restricts the information communication among local machines in a moderate active set, and leads not only to an efficient computation but also a cheaper cost of communication in a network of the distributed system. Moreover, we propose a new generalized information criterion for tuning the sparsity level on the central machine. Under mild conditions, we establish the consistency of estimation and variable selection for the proposed estimator. We demonstrate the superiority of the proposed method through several numerical studies and a real data application in adolescent health.
引用
收藏
页数:15
相关论文
共 30 条
  • [1] DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS
    Battey, Heather
    Fan, Jianqing
    Liu, Han
    Lu, Junwei
    Zhu, Ziwei
    [J]. ANNALS OF STATISTICS, 2018, 46 (03) : 1352 - 1382
  • [2] BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS
    Bertsimas, Dimitris
    King, Angela
    Mazumder, Rahul
    [J]. ANNALS OF STATISTICS, 2016, 44 (02) : 813 - 852
  • [3] Burrows M, 2006, USENIX ASSOCIATION 7TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P335
  • [4] A Constrained l1 Minimization Approach to Sparse Precision Matrix Estimation
    Cai, Tony
    Liu, Weidong
    Luo, Xi
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) : 594 - 607
  • [5] Decoding by linear programming
    Candes, EJ
    Tao, T
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (12) : 4203 - 4215
  • [6] DeCandia Giuseppe, 2007, Operating Systems Review, V41, P205, DOI 10.1145/1323293.1294281
  • [7] Least angle regression - Rejoinder
    Efron, B
    Hastie, T
    Johnstone, I
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2004, 32 (02) : 494 - 499
  • [8] Variable selection via nonconcave penalized likelihood and its oracle properties
    Fan, JQ
    Li, RZ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1348 - 1360
  • [9] INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS
    Fan, Yingying
    Lv, Jinchi
    [J]. ANNALS OF STATISTICS, 2016, 44 (05) : 2098 - 2126
  • [10] Asymptotic properties for combined L1 and concave regularization
    Fan, Yingying
    Lv, Jinchi
    [J]. BIOMETRIKA, 2014, 101 (01) : 57 - 70