On-line monitoring data quality of high-dimensional data streams

被引:10
|
作者
Qi, Dequan [2 ]
Li, Zhonghua
Wang, Zhaojun [1 ]
机构
[1] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[2] Jilin Med Univ, Dept Math, Jilin, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Data quality; false discovery rate; MEWMA; statistical process control; FALSE DISCOVERY RATE; CONTROL CHARTS; SCHEMES; IMPACT; TIME;
D O I
10.1080/00949655.2015.1106542
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, effective monitoring of data quality has increasingly attracted attention of researchers in the area of statistical process control. Among the relevant research on this topic, none used multivariate methods to control the multidimensional data quality process, but instead relied on multiple univariate control charts. Based on a novel one-sided multivariate exponentially weighted moving average (MEWMA) chart, we propose a conditional false discovery rate-adjusted scheme to on-line monitor the data quality of high-dimensional data streams. With thousands of input data streams, the average run length loses its usefulness because one will likely have out-of-control signals at each time period. Hence, we first control the percentage of signals that are false alarms. Then, we compare the power of the proposed MEWMA scheme with that of two alternative methods. Compared with two competitors, numerical results show that the proposed MEWMA scheme has higher average power.
引用
收藏
页码:2204 / 2216
页数:13
相关论文
共 50 条
  • [21] Efficient global monitoring statistics for high-dimensional data
    Li, Jun
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2020, 36 (01) : 18 - 32
  • [22] ADAPTIVE CHANGE POINT MONITORING FOR HIGH-DIMENSIONAL DATA
    Wu, Teng
    Wang, Runmin
    Yan, Hao
    Shao, Xiaofeng
    STATISTICA SINICA, 2022, 32 (03) : 1583 - 1610
  • [23] High-dimensional data monitoring using support machines
    Maboudou-Tchao, Edgard M.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (07) : 1927 - 1942
  • [24] The partitioning ensemble control chart for on-line monitoring of high-dimensional image-based quality characteristics
    Yeganeh, Ali
    Johannssen, Arne
    Chukhrova, Nataliya
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [25] Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams
    Sui, Jinping
    Liu, Zhen
    Liu, Li
    Jung, Alexander
    Li, Xiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4173 - 4186
  • [26] Efficient unsupervised drift detector for fast and high-dimensional data streams
    Vinicius M. A. Souza
    Antonio R. S. Parmezan
    Farhan A. Chowdhury
    Abdullah Mueen
    Knowledge and Information Systems, 2021, 63 : 1497 - 1527
  • [27] StreamSVC: A New Approach To Cluster Large And High-Dimensional Data Streams
    Saberi, Hasan
    Mehdiaghaei, Mohammadali
    WORLD CONGRESS ON ENGINEERING, WCE 2011, VOL III, 2011, : 1865 - 1870
  • [28] Anomaly detection in high-dimensional network data streams: A case study
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    ISI 2008: 2008 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS, 2008, : 251 - +
  • [29] A grid-based clustering algorithm for high-dimensional data streams
    Lu, YS
    Sun, YF
    Xu, GP
    Liu, G
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831
  • [30] A Statistical Control Chart forMonitoring High-dimensional Poisson Data Streams
    Wang, Zhiyuan
    Li, Yanting
    Zhou, Xiaojun
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2017, 33 (02) : 307 - 321