On-line monitoring data quality of high-dimensional data streams

被引：10

作者：

Qi, Dequan ^{[2
]}

Li, Zhonghua

Wang, Zhaojun ^{[1
]}

机构：

[1] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China

[2] Jilin Med Univ, Dept Math, Jilin, Jilin, Peoples R China

来源：

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION | 2016年 / 86卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Data quality; false discovery rate; MEWMA; statistical process control; FALSE DISCOVERY RATE; CONTROL CHARTS; SCHEMES; IMPACT; TIME;

D O I：

10.1080/00949655.2015.1106542

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In recent years, effective monitoring of data quality has increasingly attracted attention of researchers in the area of statistical process control. Among the relevant research on this topic, none used multivariate methods to control the multidimensional data quality process, but instead relied on multiple univariate control charts. Based on a novel one-sided multivariate exponentially weighted moving average (MEWMA) chart, we propose a conditional false discovery rate-adjusted scheme to on-line monitor the data quality of high-dimensional data streams. With thousands of input data streams, the average run length loses its usefulness because one will likely have out-of-control signals at each time period. Hence, we first control the percentage of signals that are false alarms. Then, we compare the power of the proposed MEWMA scheme with that of two alternative methods. Compared with two competitors, numerical results show that the proposed MEWMA scheme has higher average power.

引用

页码：2204 / 2216

页数：13

共 50 条

[31] Data Mining and Visualization of High-Dimensional ICME Data for Additive Manufacturing
Kannan, Rangasayee
Knapp, Gerald L.
Nandwana, Peeyush
Dehoff, Ryan
Plotkowski, Alex
Stump, Benjamin
Yang, Ying
Paquit, Vincent
INTEGRATING MATERIALS AND MANUFACTURING INNOVATION, 2022, 11 (01) : 57 - 70
[32] Fault classification for high-dimensional data streams: A directional diagnostic framework based on multiple hypothesis testing
Xiang, Dongdong
Li, Wendong
Tsung, Fugee
Pu, Xiaolong
Kang, Yicheng
NAVAL RESEARCH LOGISTICS, 2021, 68 (07) : 973 - 987
[33] Correlation tests for high-dimensional data using extended cross-data-matrix methodology
Yata, Kazuyoshi
Aoshima, Makoto
JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 117 : 313 - 331
[34] General power and sample size calculations for high-dimensional genomic data
van Iterson, Maarten
van de Wiel, Mark A.
Boer, Judith M.
de Menezes, Renee X.
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2013, 12 (04) : 449 - 467
[35] Compositional knockoff filter for high-dimensional regression analysis of microbiome data
Srinivasan, Arun
Xue, Lingzhou
Zhan, Xiang
BIOMETRICS, 2021, 77 (03) : 984 - 995
[36] ROBUST NEAREST-NEIGHBOR METHODS FOR CLASSIFYING HIGH-DIMENSIONAL DATA
Chan, Yao-Ban
Hall, Peter
ANNALS OF STATISTICS, 2009, 37 (6A) : 3186 - 3203
[37] Empirical Bayes Confidence Intervals for Selected Parameters in High-Dimensional Data
Hwang, J. T. Gene
Zhao, Zhigen
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (502) : 607 - 618
[38] Asymptotic Conditional Singular Value Decomposition for High-Dimensional Genomic Data
Leek, Jeffrey T.
BIOMETRICS, 2011, 67 (02) : 344 - 352
[39] STATISTICAL INFERENCE FOR HIGH-DIMENSIONAL LINEAR REGRESSION WITH BLOCKWISE MISSING DATA
Xue, Fei
Ma, Rong
Li, Hongzhe
STATISTICA SINICA, 2025, 35 (01) : 431 - 456
[40] A survey of outlier detection in high dimensional data streams
Souiden, Imen
Omri, Mohamed Nazih
Brahmi, Zaki
COMPUTER SCIENCE REVIEW, 2022, 44

← 1 2 3 4 5 →