Randomized maximum-contrast selection: Subagging for large-scale regression

被引:7
|
作者
Bradic, Jelena [1 ]
机构
[1] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
来源
ELECTRONIC JOURNAL OF STATISTICS | 2016年 / 10卷 / 01期
关键词
STABILITY SELECTION; VARIABLE SELECTION; LASSO; INEQUALITIES; CONSISTENCY; BOOTSTRAP; DESIGNS;
D O I
10.1214/15-EJS1085
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a general method for variable selection in a large-scale regression setting where both the number of parameters and the number of samples are extremely large. The proposed method is based on careful combination of penalized estimators, each applied to a random projection of the sample space into a low-dimensional space. In one special case that we study in detail, the random projections are divided into non-overlapping blocks, each consisting of only a small portion of the original data. Within each block we select the projection yielding the smallest out-of-sample error. Our random ensemble estimator then aggregates the results according to a new maximal-contrast voting scheme to determine the final selected set. Our theoretical results illustrate the effect on performance of increasing the number of non-overlapping blocks. Moreover, we demonstrate that statistical optimality is retained along with the computational speedup. The proposed method achieves minimax rates for approximate recovery over all estimators, using the full set of samples. Furthermore, our theoretical results allow the number of subsamples to grow with the subsample size and do not require irrepresentable condition. The estimator is also compared empirically with several other popular high-dimensional estimators via an extensive simulation study, which reveals its excellent finite-sample performance.
引用
收藏
页码:121 / 170
页数:50
相关论文
共 50 条
  • [1] ICOQ: Regression Proof Selection for Large-Scale Verification Projects
    Celik, Ahmet
    Palmskog, Karl
    Gligoric, Milos
    PROCEEDINGS OF THE 2017 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE'17), 2017, : 171 - 182
  • [2] Randomized Sketching for Large-Scale Sparse Ridge Regression Problems
    Iyer, Chander
    Carothers, Christopher
    Drineas, Petros
    PROCEEDINGS OF SCALA 2016: 7TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS, 2016, : 65 - 72
  • [3] TestSage: Regression Test Selection for Large-scale Web Service Testing
    Zhong, Hua
    Zhang, Lingming
    Khurshid, Sarfraz
    2019 IEEE 12TH CONFERENCE ON SOFTWARE TESTING, VALIDATION AND VERIFICATION (ICST 2019), 2019, : 430 - 440
  • [4] Randomized Greedy Algorithms for Sensor Selection in Large-Scale Satellite Constellations
    Hibbard, Michael
    Hashemi, Abolfazl
    Tanaka, Takashi
    Topcu, Ufuk
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 4276 - 4283
  • [5] Optimal Minimax Variable Selection for Large-Scale Matrix Linear Regression Model
    Hao, Meiling
    Qu, Lianqiang
    Kong, Dehan
    Sun, Liuquan
    Zhu, Hongtu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [6] Optimal minimax variable selection for large-scale matrix linear regression model
    Hao, Meiling
    Qu, Lianqiang
    Kong, Dehan
    Sun, Liuquan
    Zhu, Hongtu
    Sun, Liuquan (slq@amt.ac.cn), 1600, Microtome Publishing (22):
  • [7] QUANTILE REGRESSION FOR LARGE-SCALE APPLICATIONS
    Yang, Jiyan
    Meng, Xiangrui
    Mahoney, Michael W.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05): : S78 - S110
  • [8] Large-Scale Sparse Logistic Regression
    Liu, Jun
    Chen, Jianhui
    Ye, Jieping
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 547 - 555
  • [9] Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks
    Li, Miaoqi
    Kang, Emily L.
    STATISTICS AND COMPUTING, 2019, 29 (05) : 1165 - 1179
  • [10] Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks
    Miaoqi Li
    Emily L. Kang
    Statistics and Computing, 2019, 29 : 1165 - 1179