Toward automating post processing of aquatic sensor data

被引:8
作者
Jones, Amber Spackman [1 ,2 ,3 ]
Jones, Tanner Lex [4 ,5 ]
Horsburgh, Jeffery S. [1 ,2 ]
机构
[1] Utah State Univ, Dept Civil & Environm Engn, 8200 Old Main Hill, Logan, UT 84322 USA
[2] Utah State Univ, Utah Water Res Lab, 8200 Old Main Hill, Logan, UT 84322 USA
[3] US Geol Survey, Logan, UT 84321 USA
[4] Utah State Univ, Space Dynam Lab, 1695 North Res Pk Way, North Logan, UT USA
[5] Utah State Univ, Dept Elect & Comp Engn, 1695 North Res Pk Way, North Logan, UT USA
基金
美国国家科学基金会;
关键词
Aquatic sensors; Quality control; Anomaly detection; !text type='Python']Python[!/text; Data management; Software and data availability; QUALITY-CONTROL; TIME-SERIES; ANOMALY DETECTION; WATER-QUALITY; NETWORKS;
D O I
10.1016/j.envsoft.2022.105364
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Sensors measuring environmental phenomena at high frequency commonly report anomalies related to fouling, sensor drift and calibration, and datalogging and transmission issues. Suitability of data for analyses and decision making often depends on manual review and adjustment of data. Machine learning techniques have potential to automate identification and correction of anomalies, streamlining the quality control process. We explored ap-proaches for automating anomaly detection and correction of aquatic sensor data for implementation in a Python package (pyhydroqc). We applied both classical and deep learning time series regression models that estimate values, identify anomalies based on dynamic thresholds, and offer correction estimates. Techniques were developed and performance assessed using data reviewed, corrected, and labeled by technicians in an aquatic monitoring use case. Auto-Regressive Integrated Moving Average (ARIMA) consistently performed best, and aggregating results from multiple models improved detection. pyhydroqc includes custom functions and a workflow for anomaly detection and correction.
引用
收藏
页数:24
相关论文
共 55 条
[1]   Unsupervised real-time anomaly detection for streaming data [J].
Ahmad, Subutai ;
Lavin, Alexander ;
Purdy, Scott ;
Agha, Zuha .
NEUROCOMPUTING, 2017, 262 :134-147
[2]  
[Anonymous], 2007, P 23 C UNC ART INT U
[3]   Quantity is Nothing without Quality: Automated QA/QC for Streaming Environmental Sensor Data [J].
Campbell, John L. ;
Rustad, Lindsey E. ;
Porter, John H. ;
Taylor, Jeffrey R. ;
Dereszynski, Ethan W. ;
Shanley, James B. ;
Gries, Corinna ;
Henshaw, Donald L. ;
Martin, Mary E. ;
Sheldon, Wade M. ;
Boose, Emery R. .
BIOSCIENCE, 2013, 63 (07) :574-585
[4]   Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python']Python package) [J].
Christ, Maximilian ;
Braun, Nils ;
Neuffer, Julius ;
Kempa-Liehr, Andreas W. .
NEUROCOMPUTING, 2018, 307 :72-77
[5]  
Conde E.F, 2011, LEARNING
[6]   Anomaly Detection for IoT Time-Series Data: A Survey [J].
Cook, Andrew A. ;
Misirli, Goksel ;
Fan, Zhong .
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (07) :6481-6494
[7]  
Fiebrich CA, 2010, J ATMOS OCEAN TECH, V27, P1565, DOI [10.1175/2010JTECHA1433.1, 10.1175/2010JTECHA1433.l]
[8]  
Galarus D.E., 2012, FLAIRS Conf, P388
[9]  
Geron A, 2017, NO HAND ON MACHINE L
[10]   Environmental Data Science [J].
Gibert, Karina ;
Horsburgh, Jeffery S. ;
Athanasiadis, Ioannis N. ;
Holmes, Geoff .
ENVIRONMENTAL MODELLING & SOFTWARE, 2018, 106 :4-12