An adaptive approach for online monitoring of large-scale data streams

被引:0
作者
Cao, Shuchen [1 ]
Zhang, Ruizhi [2 ]
机构
[1] Univ Nebraska Lincoln, Dept Stat, Lincoln, NE USA
[2] Univ Georgia, Dept Stat, Athens, GA USA
关键词
False discovery rate; CUSUM; quickest change detection; process control; FALSE DISCOVERY RATE; CHANGE-POINT DETECTION; CHANGEPOINT DETECTION; SCHEMES;
D O I
10.1080/24725854.2023.2281580
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In this article, we propose an adaptive top-r method to monitor large-scale data streams where the change may affect a set of unknown data streams at some unknown time. Motivated by parallel and distributed computing, we propose to develop global monitoring schemes by parallel running local detection procedures and then use the Benjamin-Hochberg false discovery rate control procedure to estimate the number of changed data streams adaptively. Our approach is illustrated in two concrete examples: one is a homogeneous case when all data streams are independent and identically distributed with the same known pre-change and post-change distributions. The other is when all data are normally distributed, and the mean shifts are unknown and can be positive or negative. Theoretically, we show that when the pre-change and post-change distributions are completely specified, our proposed method can estimate the number of changed data streams for both the pre-change and post-change status. Moreover, we perform simulations and two case studies to show its detection efficiency.
引用
收藏
页码:119 / 130
页数:12
相关论文
共 50 条
  • [41] ACCOUNTING FOR TIME DEPENDENCE IN LARGE-SCALE MULTIPLE TESTING OF EVENT-RELATED POTENTIAL DATA
    Sheu, Ching-Fan
    Perthame, Emeline
    Lee, Yuh-Shiow
    Causeur, David
    ANNALS OF APPLIED STATISTICS, 2016, 10 (01) : 219 - 245
  • [42] Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data
    Heller, Ruth
    Chatterjee, Nilanjan
    Krieger, Abba
    Shi, Jianxin
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (524) : 1770 - 1783
  • [43] SiaDFP: A Disk Failure Prediction Framework Based on Siamese Neural Network in Large-Scale Data Center
    Fang, Xiaoyu
    Guan, Wenbai
    Li, Jiawen
    Cao, Chenhan
    Xia, Bin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (05) : 2890 - 2903
  • [44] Monitoring of high-dimensional and high-frequency data streams: A nonparametric approach
    Wang, Zhiqiong
    Li, Xin
    Wang, Ying
    Ma, Yanhui
    Xue, Li
    QUALITY TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2024,
  • [45] RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs
    Fan, Yingying
    Demirkaya, Emre
    Li, Gaorong
    Lv, Jinchi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (529) : 362 - 379
  • [46] A Robust Method for Large-Scale Multiple Hypotheses Testing
    Han, Seungbong
    Andrei, Adin-Cristian
    Tsui, Kam-Wah
    BIOMETRICAL JOURNAL, 2010, 52 (02) : 222 - 232
  • [47] A Data-Driven Time-Series Fault Prediction Framework for Dynamically Evolving Large-Scale Data Streaming Systems
    Hell, Michell
    de Aguiar, Eduardo Pestana
    Soares, Nielson
    Goliatt, Leonardo
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2022, 24 (06) : 2831 - 2844
  • [48] Adaptive Bernstein change detector for high-dimensional data streams
    Heyden, Marco
    Fouche, Edouard
    Arzamasov, Vadim
    Fenn, Tanja
    Kalinke, Florian
    Boehm, Klemens
    DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (03) : 1334 - 1363
  • [49] Large-scale atmospheric teleconnections and spatiotemporal variability of extreme rainfall indices across India
    Vinod, Degavath
    Mahesha, Amai
    JOURNAL OF HYDROLOGY, 2024, 628
  • [50] Dimension constraints improve hypothesis testing for large-scale, graph-associated, brain-image data
    Vo, Tien
    Mishra, Akshay
    Ithapu, Vamsi
    Singh, Vikas
    Newton, Michael A.
    BIOSTATISTICS, 2022, 23 (03) : 860 - 874