Detecting outliers in streaming time series data from ARM distributed sensors

被引:9
作者
Lu, Yuping [1 ]
Kumar, Jitendra [2 ]
Collier, Nathan [2 ]
Krishna, Bhargavi [2 ]
Langston, Michael A. [1 ]
机构
[1] Univ Tennessee, Knoxville, TN 37996 USA
[2] Oak Ridge Natl Lab, Oak Ridge, TN USA
来源
2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW) | 2018年
关键词
outlier detection; time series; clustering; atmospheric science; SINGULAR SPECTRUM ANALYSIS; ANOMALY DETECTION;
D O I
10.1109/ICDMW.2018.00117
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Atmospheric Radiation Measurement (ARM) Data Center at ORNL collects data from a number of permanent and mobile facilities around the globe. The data is then ingested to create high level scientific products. High frequency streaming measurements from sensors and radar instruments at ARM sites require high degree of accuracy to enable rigorous study of atmospheric processes. Outliers in collected data are common due to instrument failure or extreme weather events. Thus, it is critical to identify and flag them. We employed multiple univariate, multivariate and time series techniques for outlier detection methods and studied their effectiveness. First, we examined Pearson correlation coefficient which is used to measure the pairwise correlations between variables. Singular Spectrum Analysis (SSA) was applied to detect outliers by removing the anticipated annual and seasonal cycles from the signal to accentuate anomalies. K-means was applied for multivariate examination of data from collection of sensors to identify any deviation from expected and known patterns and identify abnormal observation. The Pearson correlation coefficient, SSA and K-means methods were later combined together in a framework to detect outliers through a range of checks. We applied the developed method to data from meteorological sensors at ARM Southern Great Plains site and validated against existing database of known data quality issues.
引用
收藏
页码:779 / 786
页数:8
相关论文
共 37 条
[1]  
Alexandrov T., 2008, ARXIV08043367
[2]  
[Anonymous], 2024, Network Common Data Form (NetCDF), DOI [10.5065/D6H70CW6, DOI 10.5065/D6H70CW6]
[3]  
Birant D, 2006, ITI 2006: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, P179, DOI 10.1109/ITI.2006.1708474
[4]   Relationship between Singular Spectrum Analysis and Fourier analysis: Theory and application to the monitoring of volcanic activity [J].
Bozzo, Enrico ;
Carniel, Roberto ;
Fasino, Dario .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2010, 60 (03) :812-820
[5]   Anomaly Detection and Diagnosis Algorithms for Discrete Symbol Sequences with Applications to Airline Safety [J].
Budalakoti, Suratna ;
Srivastava, Ashok N. ;
Otey, Matthew E. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2009, 39 (01) :101-113
[6]   Comparative Evaluation of Anomaly Detection Techniques for Sequence Data [J].
Chandola, Varun ;
Mithal, Varun ;
Kumar, Vipin .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :743-+
[7]  
Cheboli D., 2010, Anomaly Detection of Time Series
[8]   AN ALGORITHM FOR MACHINE CALCULATION OF COMPLEX FOURIER SERIES [J].
COOLEY, JW ;
TUKEY, JW .
MATHEMATICS OF COMPUTATION, 1965, 19 (90) :297-&
[9]  
Cress T. S., 2016, METEOROL MONOGR, V57, P5
[10]   AN ANALYSIS OF AUSTRALIAN SEASONAL RAINFALL ANOMALIES - 1950-1987 .2. TEMPORAL VARIABILITY AND TELECONNECTION PATTERNS [J].
DROSDOWSKY, W .
INTERNATIONAL JOURNAL OF CLIMATOLOGY, 1993, 13 (02) :111-149