PARTITIONED APPROACH FOR HIGH-DIMENSIONAL CONFIDENCE INTERVALS WITH LARGE SPLIT SIZES

被引:1
|
作者
Zheng, Zemin [1 ]
Zhang, Jiarui [1 ]
Li, Yang [1 ]
Wu, Yaohua [1 ]
机构
[1] Univ Sci & Technol, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Big data; confidence intervals; de-biased estimator; divide and conquer; large split sizes; scalability; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; DANTZIG SELECTOR; REGRESSION; SHRINKAGE; INFERENCE; LASSO;
D O I
10.5705/ss.202018.0379
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
With the availability of massive data sets, accurate inferences with low computational costs are the key to improving scalability. When the sample size and dimensionality are both large, naively applying de-biasing to derive confidence intervals can be computationally inefficient or infeasible, because the de-biasing procedure increases the computational cost by an order of magnitude compared with that of the initial penalized estimation. Therefore, we suggest a split and conquer approach to improve the scalability of the de-biasing procedure, and show that the length of the established confidence interval is asymptotically the same as that using all of the data. Moreover, we demonstrate a significant improvement in the largest split size by separating the initial estimation and the relaxed projection steps, indicating that the sample sizes needed for these two steps with statistical guarantees are different. We propose a refined inference procedure to address the inflation issue in the finite sample performance when the split size becomes large. Lastly, numerical studies demonstrate the computational advantage and theoretical guarantee of our new methodology.
引用
收藏
页码:1935 / 1959
页数:25
相关论文
共 50 条
  • [21] Conservative confidence intervals on multiple correlation coefficient for high-dimensional elliptical data using random projection methodology
    Najarzadeh, Dariush
    JOURNAL OF APPLIED STATISTICS, 2022, 49 (01) : 64 - 85
  • [22] Asymptotic Confidence Regions for High-Dimensional Structured Sparsity
    Stucky, Benjamin
    van de Geer, Sara
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (08) : 2178 - 2190
  • [23] Confidence intervals for low dimensional parameters in high dimensional linear models
    Zhang, Cun-Hui
    Zhang, Stephanie S.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (01) : 217 - 242
  • [24] StreamSVC: A New Approach To Cluster Large And High-Dimensional Data Streams
    Saberi, Hasan
    Mehdiaghaei, Mohammadali
    WORLD CONGRESS ON ENGINEERING, WCE 2011, VOL III, 2011, : 1865 - 1870
  • [25] Nonparametric confidence intervals for conditional quantiles with large-dimensional covariates
    Gardes, Laurent
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 661 - 701
  • [26] A BOOTSTRAP LASSO plus PARTIAL RIDGE METHOD TO CONSTRUCT CONFIDENCE INTERVALS FOR PARAMETERS IN HIGH-DIMENSIONAL SPARSE LINEAR MODELS
    Liu, Hanzhong
    Xu, Xin
    Li, Jingyi Jessica
    STATISTICA SINICA, 2020, 30 (03) : 1333 - 1355
  • [27] Testing and confidence intervals for high dimensional proportional hazards models
    Fang, Ethan X.
    Ning, Yang
    Liu, Han
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2017, 79 (05) : 1415 - 1437
  • [28] High-confidence nonparametric fixed-width uncertainty intervals and applications to projected high-dimensional data and common mean estimation
    Steland, Ansgar
    Chang, Yuan-Tsung
    SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS, 2021, 40 (01): : 97 - 124
  • [29] High-dimensional time irreversibility analysis of human interbeat intervals
    Hou, Feng Zhen
    Ning, Xin Bao
    Zhuang, Jian Jun
    Huang, Xiao Lin
    Fu, Mao Jing
    Bian, Chun Hua
    MEDICAL ENGINEERING & PHYSICS, 2011, 33 (05) : 633 - 637
  • [30] High-Dimensional DH Lehmer Problem over Quarter Intervals
    Zhang, Tianping
    ABSTRACT AND APPLIED ANALYSIS, 2014,