Scalable subsampling: computation, aggregation and inference

被引:4
作者
Politis, Dimitris [1 ]
机构
[1] Univ Calif San Diego, Dept Math, 9500 Gilman Dr, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
Bagging; Big data; Bootstrap; Distributed inference; Subagging; SELECTION;
D O I
10.1093/biomet/asad021
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Subsampling has seen a resurgence in the big data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size b can be computationally challenging with both b and the sample size n being very large. This paper shows how a set of appropriately chosen, nonrandom subsamples can be used to conduct effective, and computationally feasible, subsampling distribution estimation. Furthermore, the same set of subsamples can be used to yield a procedure for subsampling aggregation, also known as subagging, that is scalable with big data. Interestingly, the scalable subagging estimator can be tuned to have the same, or better, rate of convergence than that of theta<^>n. Statistical inference could then be based on the scalable subagging estimator instead of the original theta<^>n.
引用
收藏
页码:347 / 354
页数:8
相关论文
共 20 条
  • [1] DIVIDE AND CONQUER IN NONSTANDARD PROBLEMS AND THE SUPER-EFFICIENCY PHENOMENON
    Banerjee, Moulinath
    Durot, Cecile
    Sen, Bodhisattva
    [J]. ANNALS OF STATISTICS, 2019, 47 (02) : 720 - 757
  • [2] Empirical Processes in Survey Sampling with (Conditional) Poisson Designs
    Bertail, Patrice
    Chautru, Emilie
    Clemencon, Stephan
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2017, 44 (01) : 97 - 111
  • [3] Randomized maximum-contrast selection: Subagging for large-scale regression
    Bradic, Jelena
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2016, 10 (01): : 121 - 170
  • [4] Bühlmann P, 2002, ANN STAT, V30, P927
  • [5] DISTRIBUTED STATISTICAL INFERENCE FOR MASSIVE DATA
    Chen, Song Xi
    Peng, Liuhua
    [J]. ANNALS OF STATISTICS, 2021, 49 (05) : 2851 - 2869
  • [6] A SPLIT-AND-CONQUER APPROACH FOR ANALYSIS OF EXTRAORDINARILY LARGE DATA
    Chen, Xueying
    Xie, Min-ge
    [J]. STATISTICA SINICA, 2014, 24 (04) : 1655 - 1684
  • [7] Least angle regression - Rejoinder
    Efron, B
    Hastie, T
    Johnstone, I
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2004, 32 (02) : 494 - 499
  • [8] On statistics, computation and scalability
    Jordan, Michael I.
    [J]. BERNOULLI, 2013, 19 (04) : 1378 - 1390
  • [9] A scalable bootstrap for massive data
    Kleiner, Ariel
    Talwalkar, Ameet
    Sarkar, Purnamrita
    Jordan, Michael I.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (04) : 795 - 816
  • [10] Fast surrogates of U-statistics
    Lin, N.
    Xi, R.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (01) : 16 - 24