Evaluating the validity of class balancing algorithms-based machine learning models for geogenic contaminated groundwaters prediction

被引:13
作者
Cao, Hailong [1 ]
Xie, Xianjun [1 ]
Shi, Jianbo [1 ]
Wang, Yanxin [1 ]
机构
[1] China Univ Geosci, Sch Environm Studies, State Key Lab Biogeol & Environm Geol, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine learning; Groundwater; Arsenic; Weighted cross-entropy; Adaptive synthetic sampling; ARSENIC CONTAMINATION; FLUORIDE; WELLS;
D O I
10.1016/j.jhydrol.2022.127933
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Data-driven machine learning models have been used to predict hazardous substances levels in groundwater. However, class-imbalanced data results in models that may show grossly low sensitivity even though they show high overall accuracy. To address this issue, four algorithms weighted cross-entropy loss, Random over sampling, Random undersampling, and Adaptive synthetic sampling (ADASYN) were tested for their validity in improving model sensitivity. Testing of the above four algorithms using geogenic high arsenic groundwater data from the Datong Basin, the Red River Delta of Vietnam, Bangladesh, Texas and California showed that all four algorithms produced more accurate predictions with an average increase in sensitivity of 53.8% compared to the raw models. The ADASYN is the best of the four algorithms and can increase model G-means (geometric mean of sensitivity and specificity) by >40% on average. The ADASYN-optimized ANN models predicted higher groundwater As exposure risk in Ghana than that in Ethiopia.
引用
收藏
页数:10
相关论文
共 58 条
[1]   Total coliforms, arsenic and cadmium exposure through drinking water in the Western Region of Ghana: application of multivariate statistical technique to groundwater quality [J].
Affum, Andrews Obeng ;
Osae, Shiloh Dede ;
Nyarko, Benjamin Jabez Botwe ;
Afful, Samuel ;
Fianko, Joseph Richmond ;
Akiti, Tetteh Thomas ;
Adomako, Dickson ;
Acquaah, Samuel Osafo ;
Dorleku, Micheal ;
Antoh, Emmanuel ;
Barnes, Felix ;
Affum, Enoch Acheampong .
ENVIRONMENTAL MONITORING AND ASSESSMENT, 2015, 187 (02)
[2]  
[Anonymous], 1995, AREAL DISTRIBUTION S, DOI DOI 10.3133/WRI954048
[3]  
[Anonymous], 2016, DEEP LEARNING
[4]   Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function [J].
Aurelio, Yuri Sousa ;
de Almeida, Gustavo Matheus ;
de Castro, Cristiano Leite ;
Braga, Antonio Padua .
NEURAL PROCESSING LETTERS, 2019, 50 (02) :1937-1949
[5]   Estimating the High-Arsenic Domestic-Well Population in the Conterminous United States [J].
Ayotte, Joseph D. ;
Medalie, Laura ;
Qi, Sharon L. ;
Backer, Lorraine C. ;
Nolan, Bernard T. .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2017, 51 (21) :12443-12454
[6]   Predicting Arsenic in Drinking Water Wells of the Central Valley, California [J].
Ayotte, Joseph D. ;
Nolan, Bernard T. ;
Gronberg, Jo Ann .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2016, 50 (14) :7555-7563
[7]   The interactive natural drivers of global geogenic arsenic contamination of groundwater [J].
Cao, Hailong ;
Xie, Xianjun ;
Wang, Yanxin ;
Deng, Yamin .
JOURNAL OF HYDROLOGY, 2021, 597
[8]   Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network [J].
Cho, Kyung Hum ;
Sthiannopkao, Suthipong ;
Pachepsky, Yakou A. ;
Kim, Kyoung-Woong ;
Kim, Joon Ha .
WATER RESEARCH, 2011, 45 (17) :5535-5544
[9]   Manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA-Modeling regional occurrence with pH, redox, and machine learning [J].
DeSimone, Leslie A. ;
Ransom, Katherine M. .
JOURNAL OF HYDROLOGY-REGIONAL STUDIES, 2021, 37
[10]  
Erickson ML, 2018, WATER RESOUR RES, V54, P10172, DOI [10.1029/2018WR023106, 10.1029/2018wr023106]