PyLEnM: A Machine Learning Framework for Long-Term Groundwater Contamination Monitoring Strategies

被引:25
作者
Meray, Aurelien O. [1 ]
Sturla, Savannah [2 ]
Siddiquee, Masudur R. [1 ]
Serata, Rebecca [3 ]
Uhlemann, Sebastian [4 ]
Gonzalez-Raymat, Hansell [5 ]
Denham, Miles [6 ]
Upadhyay, Himanshu [1 ]
Lagos, Leonel E. [1 ]
Eddy-Dilek, Carol [5 ]
Wainwright, Haruko M. [4 ,7 ]
机构
[1] Florida Int Univ, Appl Res Ctr, Miami, FL 33174 USA
[2] Univ Calif Berkeley, Dept Environm Sci Policy & Management, Berkeley, CA 94709 USA
[3] Univ Calif Berkeley, Dept Civil & Environm Engn, Berkeley, CA 94709 USA
[4] Lawrence Berkeley Natl Lab, Climate & Ecosyst Sci Div, Berkeley, CA 94704 USA
[5] Savannah River Natl Lab, Aiken, SC 29808 USA
[6] Panoram Environm Consulting LLC, Aiken, SC 29802 USA
[7] MIT, Dept Nucl Sci & Engn, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
open-source package; machine learning; spatial estimation; sensor placement optimization; Gaussian process model; unsupervised learning; groundwater contamination; PLUME;
D O I
10.1021/acs.est.1c07440
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In this study, we have developed a comprehensive machine learning (ML) framework for long-term groundwater contaminationmonitoring as the Python package PyLEnM (Python for Long-termEnvironmental Monitoring). PyLEnM aims to establish the seamless data-to-ML pipeline with various utility functions, such as quality assurance andquality control (QA/QC), coincident/colocated data identification, theautomated ingestion and processing of publicly available spatial data layers,and novel data summarization/visualization. The key ML innovations include(1) time series/multianalyte clustering tofind the well groups that havesimilar groundwater dynamics and to inform spatial interpolation and welloptimization, (2) the automated model selection and parameter tuning,comparing multiple regression models for spatial interpolation, (3) the proxy-based spatial interpolation method by including spatialdata layers or in situ measurable variables as predictors for contaminant concentrations and groundwater levels, and (4) the new welloptimization algorithm to identify the most effective subset of wells for maintaining the spatial interpolation ability for long-termmonitoring. We demonstrate our methodology using the monitoring data at the Savannah River Site F-Area. Through this open-source PyLEnM package, we aim to improve the transparency of data analytics at contaminated sites, empowering concerned citizens as well as improving public relations
引用
收藏
页码:5973 / 5983
页数:11
相关论文
共 42 条
[1]  
Abdalla S., PANDAS
[2]   Time-series clustering - A decade review [J].
Aghabozorgi, Saeed ;
Shirkhorshidi, Ali Seyed ;
Teh Ying Wah .
INFORMATION SYSTEMS, 2015, 53 :16-38
[3]  
Amici A., ELEVATION
[4]   Statistical modeling of global geogenic arsenic contamination in groundwater [J].
Amini, Manouchehr ;
Abbaspour, Karim C. ;
Berg, Michael ;
Winkel, Lenny ;
Hug, Stephan J. ;
Hoehn, Eduard ;
Yang, Hong ;
Johnson, C. Annette .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2008, 42 (10) :3669-3675
[5]  
[Anonymous], 2017, Superfund: National Priorities List (NPL)
[6]  
[Anonymous], 2009, Introduction to Algorithms
[7]  
[Anonymous], 2000, RENEWAL APPL RCRA B, VI
[8]   Identifying key controls on the behavior of an acidic-U(VI) plume in the Savannah River Site using reactive transport modeling [J].
Bea, Sergio A. ;
Wainwright, Haruko ;
Spycher, Nicolas ;
Faybishenko, Boris ;
Hubbard, Susan S. ;
Denham, Miles E. .
JOURNAL OF CONTAMINANT HYDROLOGY, 2013, 151 :34-54
[9]  
Berg S., NUMPY
[10]  
Brochart D., 2016, IPYLEAFLET INTERACTI