On-line monitoring data quality of high-dimensional data streams

被引:10
作者
Qi, Dequan [2 ]
Li, Zhonghua
Wang, Zhaojun [1 ]
机构
[1] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[2] Jilin Med Univ, Dept Math, Jilin, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Data quality; false discovery rate; MEWMA; statistical process control; FALSE DISCOVERY RATE; CONTROL CHARTS; SCHEMES; IMPACT; TIME;
D O I
10.1080/00949655.2015.1106542
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, effective monitoring of data quality has increasingly attracted attention of researchers in the area of statistical process control. Among the relevant research on this topic, none used multivariate methods to control the multidimensional data quality process, but instead relied on multiple univariate control charts. Based on a novel one-sided multivariate exponentially weighted moving average (MEWMA) chart, we propose a conditional false discovery rate-adjusted scheme to on-line monitor the data quality of high-dimensional data streams. With thousands of input data streams, the average run length loses its usefulness because one will likely have out-of-control signals at each time period. Hence, we first control the percentage of signals that are false alarms. Then, we compare the power of the proposed MEWMA scheme with that of two alternative methods. Compared with two competitors, numerical results show that the proposed MEWMA scheme has higher average power.
引用
收藏
页码:2204 / 2216
页数:13
相关论文
共 50 条
  • [41] Uncovering temporal patterns in visualizations of high-dimensional data
    Policar, Pavlin G.
    Zupan, Blaz
    [J]. MACHINE LEARNING, 2025, 114 (02)
  • [42] Controlled variable selection in Weibull mixture cure models for high-dimensional data
    Fu, Han
    Nicolet, Deedra
    Mrozek, Krzysztof
    Stone, Richard M.
    Eisfeld, Ann-Kathrin
    Byrd, John C.
    Archer, Kellie J.
    [J]. STATISTICS IN MEDICINE, 2022, 41 (22) : 4340 - 4366
  • [43] Estimation of the proportion of true null hypotheses in high-dimensional data under dependence
    Friguet, Chloe
    Causeur, David
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (09) : 2665 - 2676
  • [44] Optimal detection of functional connectivity from high-dimensional EEG synchrony data
    Singh, Archana K.
    Asoh, Hideki
    Phillips, Steven
    [J]. NEUROIMAGE, 2011, 58 (01) : 148 - 156
  • [45] High-Dimensional Mediation Analysis Based on Additive Hazards Model for Survival Data
    Cui, Yidan
    Luo, Chengwen
    Luo, Linghao
    Yu, Zhangsheng
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [46] A Decision-Theory Approach to Interpretable Set Analysis for High-Dimensional Data
    Boca, Simina M.
    Bravo, Hector Ceorrada
    Caffo, Brian
    Leek, Jeffrey T.
    Parmigiani, Giovanni
    [J]. BIOMETRICS, 2013, 69 (03) : 614 - 623
  • [47] An adaptive approach for online monitoring of large-scale data streams
    Cao, Shuchen
    Zhang, Ruizhi
    [J]. IISE TRANSACTIONS, 2025, 57 (02) : 119 - 130
  • [48] Using on-line process data to improve quality: Challenges for statisticians
    MacGregor, JF
    [J]. INTERNATIONAL STATISTICAL REVIEW, 1997, 65 (03) : 309 - 323
  • [49] A tractable method to account for high-dimensional nonignorable missing data in intensive longitudinal data
    Yuan, Chengbo
    Hedeker, Donald
    Mermelstein, Robin
    Xie, Hui
    [J]. STATISTICS IN MEDICINE, 2020, 39 (20) : 2589 - 2605
  • [50] Implantation of an on-line quality process monitoring
    Noyel, Melanie
    Thomas, Philippe
    Charpentier, Patrick
    Thomas, Andre
    Brault, Thomas
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND SYSTEMS MANAGEMENT (IEEE-IESM 2013), 2013, : 163 - 168