On-line monitoring data quality of high-dimensional data streams

被引:10
作者
Qi, Dequan [2 ]
Li, Zhonghua
Wang, Zhaojun [1 ]
机构
[1] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[2] Jilin Med Univ, Dept Math, Jilin, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Data quality; false discovery rate; MEWMA; statistical process control; FALSE DISCOVERY RATE; CONTROL CHARTS; SCHEMES; IMPACT; TIME;
D O I
10.1080/00949655.2015.1106542
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, effective monitoring of data quality has increasingly attracted attention of researchers in the area of statistical process control. Among the relevant research on this topic, none used multivariate methods to control the multidimensional data quality process, but instead relied on multiple univariate control charts. Based on a novel one-sided multivariate exponentially weighted moving average (MEWMA) chart, we propose a conditional false discovery rate-adjusted scheme to on-line monitor the data quality of high-dimensional data streams. With thousands of input data streams, the average run length loses its usefulness because one will likely have out-of-control signals at each time period. Hence, we first control the percentage of signals that are false alarms. Then, we compare the power of the proposed MEWMA scheme with that of two alternative methods. Compared with two competitors, numerical results show that the proposed MEWMA scheme has higher average power.
引用
收藏
页码:2204 / 2216
页数:13
相关论文
共 50 条
  • [31] Data Mining and Visualization of High-Dimensional ICME Data for Additive Manufacturing
    Kannan, Rangasayee
    Knapp, Gerald L.
    Nandwana, Peeyush
    Dehoff, Ryan
    Plotkowski, Alex
    Stump, Benjamin
    Yang, Ying
    Paquit, Vincent
    INTEGRATING MATERIALS AND MANUFACTURING INNOVATION, 2022, 11 (01) : 57 - 70
  • [32] Fault classification for high-dimensional data streams: A directional diagnostic framework based on multiple hypothesis testing
    Xiang, Dongdong
    Li, Wendong
    Tsung, Fugee
    Pu, Xiaolong
    Kang, Yicheng
    NAVAL RESEARCH LOGISTICS, 2021, 68 (07) : 973 - 987
  • [33] Correlation tests for high-dimensional data using extended cross-data-matrix methodology
    Yata, Kazuyoshi
    Aoshima, Makoto
    JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 117 : 313 - 331
  • [34] General power and sample size calculations for high-dimensional genomic data
    van Iterson, Maarten
    van de Wiel, Mark A.
    Boer, Judith M.
    de Menezes, Renee X.
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2013, 12 (04) : 449 - 467
  • [35] Compositional knockoff filter for high-dimensional regression analysis of microbiome data
    Srinivasan, Arun
    Xue, Lingzhou
    Zhan, Xiang
    BIOMETRICS, 2021, 77 (03) : 984 - 995
  • [36] ROBUST NEAREST-NEIGHBOR METHODS FOR CLASSIFYING HIGH-DIMENSIONAL DATA
    Chan, Yao-Ban
    Hall, Peter
    ANNALS OF STATISTICS, 2009, 37 (6A) : 3186 - 3203
  • [37] Empirical Bayes Confidence Intervals for Selected Parameters in High-Dimensional Data
    Hwang, J. T. Gene
    Zhao, Zhigen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (502) : 607 - 618
  • [38] Asymptotic Conditional Singular Value Decomposition for High-Dimensional Genomic Data
    Leek, Jeffrey T.
    BIOMETRICS, 2011, 67 (02) : 344 - 352
  • [39] STATISTICAL INFERENCE FOR HIGH-DIMENSIONAL LINEAR REGRESSION WITH BLOCKWISE MISSING DATA
    Xue, Fei
    Ma, Rong
    Li, Hongzhe
    STATISTICA SINICA, 2025, 35 (01) : 431 - 456
  • [40] A survey of outlier detection in high dimensional data streams
    Souiden, Imen
    Omri, Mohamed Nazih
    Brahmi, Zaki
    COMPUTER SCIENCE REVIEW, 2022, 44