Siamese Network-Based Transfer Learning Model to Predict Geogenic Contaminated Groundwaters

被引:15
作者
Cao, Hailong [1 ,2 ]
Xie, Xianjun [1 ,2 ]
Shi, Jianbo [1 ,2 ]
Jiang, Guibin [3 ]
Wang, Yanxin [1 ,2 ]
机构
[1] China Univ Geosci, Sch Environm Studies, Wuhan 430074, Peoples R China
[2] China Univ Geosci, State Key Lab Biogeol & Environm Geol, Wuhan 430074, Peoples R China
[3] Chinese Acad Sci, Res Ctr Ecoenvironm Sci, State Key Lab Environm Chem & Ecotoxicol, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
groundwater; Siamese network; transfer learning; class-imbalanced data; prediction; ARSENIC CONTAMINATION; FLUORIDE; IODINE; CHINA; WELLS;
D O I
10.1021/acs.est.1c08682
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Exposure to geogenic contaminated groundwaters (GCGs) is a significant public health concern. Machine learning models are powerful tools for the discovery of potential GCGs. However, the insufficient groundwater quality data and the fact that GCGs are typically a minority class in data hinder models to produce meaningful GCG predictions. To address this issue, a deep learning method, Siamese network-based transfer learning (SNTL), is used to estimate the probability that hazardous substances are present in groundwater above a threshold based on limited and class-imbalanced data. SNTL greatly reduces the amount of required training data and eliminates negative effects of class-imbalanced data on prediction model performance. The predictions of three typical GCGs (high arsenic/fluoride/iodine groundwater) show that the SNTL models provide higher (about 80%) and more balanced sensitivity and specificity than benchmark Random Forest models, indicating that SNTL models can predict both GCGs and non-GCGs. Therefore, protecting populations from GCG exposure in areas where other prediction methods fail to contribute risk information due to poor groundwater quality data can be enabled by SNTL.
引用
收藏
页码:11071 / 11079
页数:9
相关论文
共 62 条
[1]   Sources, sinks and long-term cycling of iodine in the hyperarid Atacama continental margin [J].
Alvarez, Fernanda ;
Reich, Martin ;
Perez-Fodich, Alida ;
Snyder, Glen ;
Muramatsu, Yasuyuki ;
Vargas, Gabriel ;
Fehn, Udo .
GEOCHIMICA ET COSMOCHIMICA ACTA, 2015, 161 :50-70
[2]   Statistical modeling of global geogenic fluoride contamination in groundwaters [J].
Amini, Manouchehr ;
Mueller, Kim ;
Abbaspour, Karim C. ;
Rosenberg, Thomas ;
Afyuni, Majid ;
Moller, Klaus N. ;
Sarr, Mamadou ;
Johnson, C. Annette .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2008, 42 (10) :3662-3668
[3]  
[Anonymous], 2016, P MACHINE LEARNING R
[4]   Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks [J].
Apostolopoulos, Ioannis D. ;
Mpesiana, Tzani A. .
PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2020, 43 (02) :635-640
[5]   Estimating the High-Arsenic Domestic-Well Population in the Conterminous United States [J].
Ayotte, Joseph D. ;
Medalie, Laura ;
Qi, Sharon L. ;
Backer, Lorraine C. ;
Nolan, Bernard T. .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2017, 51 (21) :12443-12454
[6]   Predicting Arsenic in Drinking Water Wells of the Central Valley, California [J].
Ayotte, Joseph D. ;
Nolan, Bernard T. ;
Gronberg, Jo Ann .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2016, 50 (14) :7555-7563
[7]   Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks [J].
Bazgir, Omid ;
Zhang, Ruibo ;
Dhruba, Saugato Rahman ;
Rahman, Raziur ;
Ghosh, Souparno ;
Pal, Ranadip .
NATURE COMMUNICATIONS, 2020, 11 (01)
[8]   Machine learning for data-driven discovery in solid Earth geoscience [J].
Bergen, Karianne J. ;
Johnson, Paul A. ;
de Hoop, Maarten V. ;
Beroza, Gregory C. .
SCIENCE, 2019, 363 (6433) :1299-+
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   The interactive natural drivers of global geogenic arsenic contamination of groundwater [J].
Cao, Hailong ;
Xie, Xianjun ;
Wang, Yanxin ;
Deng, Yamin .
JOURNAL OF HYDROLOGY, 2021, 597