High-dimensional changepoint estimation with heterogeneous missingness

被引:6
|
作者
Follain, Bertille [1 ,2 ]
Wang, Tengyao [3 ,4 ]
Samworth, Richard J. [1 ]
机构
[1] Univ Cambridge, Stat Lab, Cambridge, Cambs, England
[2] PSL Res Univ, INRIA, Ecole Normale Super, Paris, France
[3] London Sch Econ & Polit Sci, Dept Stat, London, England
[4] UCL, Dept Stat Sci, London, England
基金
欧洲研究理事会; 英国工程与自然科学研究理事会;
关键词
changepoint estimation; high-dimensional data; missing data; segmentation; sparsity; CHANGE-POINT DETECTION; MAXIMUM-LIKELIHOOD-ESTIMATION; BINARY SEGMENTATION; TIME-SERIES; SPARSE; NUMBER;
D O I
10.1111/rssb.12540
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a new method for changepoint estimation in partially observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a 'MissCUSUM' transformation (a generalisation of the popular cumulative sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period.
引用
收藏
页码:1023 / 1055
页数:33
相关论文
共 50 条
  • [1] High-dimensional principal component analysis with heterogeneous missingness
    Zhu, Ziwei
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (05) : 2000 - 2031
  • [2] Inference in High-Dimensional Online Changepoint Detection
    Chen, Yudong
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1461 - 1472
  • [3] High-dimensional, multiscale online changepoint detection
    Chen, Yudong
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (01) : 234 - 266
  • [4] A computationally efficient, high-dimensional multiple changepoint procedure with application to global terrorism incidence
    Tickle, S. O.
    Eckley, I. A.
    Fearnhead, P.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2021, 184 (04) : 1303 - 1325
  • [5] Converting high-dimensional regression to high-dimensional conditional density estimation
    Izbicki, Rafael
    Lee, Ann B.
    ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 2800 - 2831
  • [6] Efficient change point detection and estimation in high-dimensional correlation matrices
    Li, Zhaoyuan
    Gao, Jie
    ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (01): : 942 - 979
  • [7] High-dimensional changepoint detection via a geometrically inspired mapping
    Grundy, Thomas
    Killick, Rebecca
    Mihaylov, Gueorgui
    STATISTICS AND COMPUTING, 2020, 30 (04) : 1155 - 1166
  • [8] High-dimensional changepoint detection via a geometrically inspired mapping
    Thomas Grundy
    Rebecca Killick
    Gueorgui Mihaylov
    Statistics and Computing, 2020, 30 : 1155 - 1166
  • [9] JOINT ESTIMATION OF MULTIPLE HIGH-DIMENSIONAL PRECISION MATRICES
    Cai, T. Tony
    Li, Hongzhe
    Liu, Weidong
    Xie, Jichun
    STATISTICA SINICA, 2016, 26 (02) : 445 - 464
  • [10] Regularized Estimation of Linear Functionals of Precision Matrices for High-Dimensional Time Series
    Chen, Xiaohui
    Xu, Mengyu
    Wu, Wei Biao
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (24) : 6459 - 6470