PARTITIONED APPROACH FOR HIGH-DIMENSIONAL CONFIDENCE INTERVALS WITH LARGE SPLIT SIZES

被引:1
|
作者
Zheng, Zemin [1 ]
Zhang, Jiarui [1 ]
Li, Yang [1 ]
Wu, Yaohua [1 ]
机构
[1] Univ Sci & Technol, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Big data; confidence intervals; de-biased estimator; divide and conquer; large split sizes; scalability; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; DANTZIG SELECTOR; REGRESSION; SHRINKAGE; INFERENCE; LASSO;
D O I
10.5705/ss.202018.0379
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
With the availability of massive data sets, accurate inferences with low computational costs are the key to improving scalability. When the sample size and dimensionality are both large, naively applying de-biasing to derive confidence intervals can be computationally inefficient or infeasible, because the de-biasing procedure increases the computational cost by an order of magnitude compared with that of the initial penalized estimation. Therefore, we suggest a split and conquer approach to improve the scalability of the de-biasing procedure, and show that the length of the established confidence interval is asymptotically the same as that using all of the data. Moreover, we demonstrate a significant improvement in the largest split size by separating the initial estimation and the relaxed projection steps, indicating that the sample sizes needed for these two steps with statistical guarantees are different. We propose a refined inference procedure to address the inflation issue in the finite sample performance when the split size becomes large. Lastly, numerical studies demonstrate the computational advantage and theoretical guarantee of our new methodology.
引用
收藏
页码:1935 / 1959
页数:25
相关论文
共 50 条
  • [31] Honest Confidence Sets for High-Dimensional Regression by Projection and Shrinkage
    Zhou, Kun
    Li, Ker-Chau
    Zhou, Qing
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 469 - 488
  • [32] ON ASYMPTOTICALLY OPTIMAL CONFIDENCE REGIONS AND TESTS FOR HIGH-DIMENSIONAL MODELS
    Van de Geer, Sara
    Buehlmann, Peter
    Ritov, Ya'acov
    Dezeure, Ruben
    ANNALS OF STATISTICS, 2014, 42 (03): : 1166 - 1202
  • [33] INTEGRATIVE EXPLORATION OF LARGE HIGH-DIMENSIONAL DATASETS
    Pardy, Christopher
    Galbraith, Sally
    Wilson, Susan R.
    ANNALS OF APPLIED STATISTICS, 2018, 12 (01): : 178 - 199
  • [34] A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data
    Fang, Jianglin
    JOURNAL OF MULTIVARIATE ANALYSIS, 2023, 194
  • [35] A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations
    Neykov, Matey
    Ning, Yang
    Liu, Jun S.
    Liu, Han
    STATISTICAL SCIENCE, 2018, 33 (03) : 427 - 443
  • [36] Variance estimation and confidence intervals from genome-wide association studies through high-dimensional misspecified mixed model analysis
    Dao, Cecilia
    Jiang, Jiming
    Paul, Debashis
    Zhao, Hongyu
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2022, 220 : 15 - 23
  • [37] Imaging with Confidence: Uncertainty Quantification for High-Dimensional Undersampled MR Images
    Hoppe, Frederik
    Verdun, Claudio Mayrink
    Laus, Hannah
    Endt, Sebastian
    Menzel, Marion, I
    Krahmer, Felix
    Rauhut, Holger
    COMPUTER VISION - ECCV 2024, PT LXXVIII, 2025, 15136 : 432 - 450
  • [38] Honest confidence regions and optimality in high-dimensional precision matrix estimation
    Jana Janková
    Sara van de Geer
    TEST, 2017, 26 : 143 - 162
  • [39] Honest confidence regions and optimality in high-dimensional precision matrix estimation
    Jankova, Jana
    van de Geer, Sara
    TEST, 2017, 26 (01) : 143 - 162
  • [40] A Separability Marker Based on High-Dimensional Statistics for Classification Confidence Assessment
    Gayraud, Nathalie T. H.
    Foy, Nathanael
    Clerc, Maureen
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 3193 - 3198