On-line monitoring data quality of high-dimensional data streams

被引:10
|
作者
Qi, Dequan [2 ]
Li, Zhonghua
Wang, Zhaojun [1 ]
机构
[1] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[2] Jilin Med Univ, Dept Math, Jilin, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Data quality; false discovery rate; MEWMA; statistical process control; FALSE DISCOVERY RATE; CONTROL CHARTS; SCHEMES; IMPACT; TIME;
D O I
10.1080/00949655.2015.1106542
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, effective monitoring of data quality has increasingly attracted attention of researchers in the area of statistical process control. Among the relevant research on this topic, none used multivariate methods to control the multidimensional data quality process, but instead relied on multiple univariate control charts. Based on a novel one-sided multivariate exponentially weighted moving average (MEWMA) chart, we propose a conditional false discovery rate-adjusted scheme to on-line monitor the data quality of high-dimensional data streams. With thousands of input data streams, the average run length loses its usefulness because one will likely have out-of-control signals at each time period. Hence, we first control the percentage of signals that are false alarms. Then, we compare the power of the proposed MEWMA scheme with that of two alternative methods. Compared with two competitors, numerical results show that the proposed MEWMA scheme has higher average power.
引用
收藏
页码:2204 / 2216
页数:13
相关论文
共 50 条
  • [1] Efficiently tracing clusters over high-dimensional on-line data streams
    Lee, Jae Woo
    Park, Nam Hun
    Lee, Won Suk
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (03) : 362 - 379
  • [2] An Efficient Online Monitoring Method for High-Dimensional Data Streams
    Zou, Changliang
    Wang, Zhaojun
    Jiang, Wei
    Zi, Xuemin
    TECHNOMETRICS, 2015, 57 (03) : 374 - 387
  • [3] Monitoring and root-cause diagnostics of high-dimensional data streams
    Ebrahimi, Samaneh
    Ranjan, Chitta
    Paynabar, Kamran
    JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (01) : 20 - 43
  • [4] Monitoring of high-dimensional and high-frequency data streams: A nonparametric approach
    Wang, Zhiqiong
    Li, Xin
    Wang, Ying
    Ma, Yanhui
    Xue, Li
    QUALITY TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2024,
  • [5] A two-stage online monitoring procedure for high-dimensional data streams
    Li, Jun
    JOURNAL OF QUALITY TECHNOLOGY, 2019, 51 (04) : 392 - 406
  • [6] Self-Starting Monitoring and Dynamic Sampling of High-Dimensional Data Streams
    Zhang, Jiahui
    Zheng, Ziqian
    Li, Jun
    Liu, Kaibo
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
  • [7] A two-stage monitoring scheme for multiple high-dimensional data streams
    Wang, Tao
    Shi, Pin
    Zang, Qingpei
    Li, Zhonghua
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024, 94 (12) : 2597 - 2617
  • [8] Detecting Projected Outliers in High-Dimensional Data Streams
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    Liu, Qing
    Xu, Kai
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 629 - +
  • [9] Online Pattern Mining for High-Dimensional Data Streams
    Yamamoto, Yoshitaka
    Iwanuma, Koji
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2880 - 2882
  • [10] Generalized projected clustering in high-dimensional data streams
    Wang, T
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778