Prediction of water quality parameters using machine learning models: a case study of the Karun River, Iran

被引:38
作者
Nouraki, Atefeh [1 ]
Alavi, Mohammad [1 ]
Golabi, Mona [1 ]
Albaji, Mohammad [1 ]
机构
[1] Shahid Chamran Univ Ahvaz, Fac Water & Environm Engn, Dept Irrigat & Drainage, Ahvaz, Iran
关键词
M5P model tree; Principal component analysis; Random forest regression; Support vector regression; Water quality prediction; REGRESSION-MODELS; TIGRIS RIVER; ALGORITHMS; BAGHDAD;
D O I
10.1007/s11356-021-14560-8
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Accurate water quality predicting has an essential role in improving water management and pollution control. The machine learning models have been successfully implemented for modelling total dissolved solids (TDS), sodium absorption ratio (SAR) and total hardness (TH) content in aquatic ecosystems with insufficient data. However, due to multiple pollution sources and complex behaviours of pollutants, these models' effect in predicting TDS, SAR, and TH levels in the Karun River system is still unclear. Given this problem, multiple linear regression (MLR), M5P model tree, support vector regression (SVR) and random forest regression (RFR) models were used to predict TDS, SAR and TH variables in the four stations in the Karun River for 1999-2019 period. Initially, to reduce the number of input variables, the principal component analysis (PCA) technique was used. The developed models are valued in terms of the coefficient of determination (R-2) and the root mean square error (RMSE). Base on the PCA, it was found that sodium (Na), chloride (Cl) and TH and Na and Cl are the most influential inputs on TDS and SAR, respectively, while calcium (Ca) and magnesium (Mg) are the most effective on TH. The results indicated that RFR, SVR and MLR models had the lowest error in predicting TDS, SAR and TH, respectively, in all stations. RFR model had the highest performance for predicting TDS (R-2= 0.98, RMSE= 70.50 mg l(-1)), SVR model for predicting SAR (R-2= 0.99, RMSE= 0.04) and MLR model for predicting TH (R-2= 0.99, RMSE= 1.54 mg l(-1)) in Darkhovin station. The comparison of the results indicated that the machine learning models could satisfactorily estimate the TDS, SAR and TH for all stations.
引用
收藏
页码:57060 / 57072
页数:13
相关论文
共 31 条
[1]  
Abbas SH, 2019, J ENG SCI TECHNOL, V14, P3337
[2]   Machine learning methods for better water quality prediction [J].
Ahmed, Ali Najah ;
Othman, Faridah Binti ;
Afan, Haitham Abdulmohsin ;
Ibrahim, Rusul Khaleel ;
Fai, Chow Ming ;
Hossain, Md Shabbir ;
Ehteram, Mohammad ;
Elshafie, Ahmed .
JOURNAL OF HYDROLOGY, 2019, 578
[3]   Modeling Water Quality Parameters Using Data-Driven Models, a Case Study Abu-Ziriq Marsh in South of Iraq [J].
Al-Mukhtar, Mustafa ;
Al-Yaseen, Fuaad .
HYDROLOGY, 2019, 6 (01)
[4]   Prediction of Water Quality Parameters Using ANFIS Optimized by Intelligence Algorithms (Case Study: Gorganrood River) [J].
Azad, Armin ;
Karami, Hojat ;
Farzin, Saeed ;
Saeedian, Amir ;
Kashi, Hamed ;
Sayyahi, Fatemeh .
KSCE JOURNAL OF CIVIL ENGINEERING, 2018, 22 (07) :2206-2213
[5]   Estimation of total dissolved solids (TDS) using new hybrid machine learning models [J].
Banadkooki, Fatemeh Barzegari ;
Ehteram, Mohammad ;
Panahi, Fatemeh ;
Sammen, Saad Sh ;
Othman, Faridah Binti ;
EL-Shafie, Ahmed .
JOURNAL OF HYDROLOGY, 2020, 587
[6]  
Biau G, 2016, TEST-SPAIN, V25, P264, DOI 10.1007/s11749-016-0488-0
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs [J].
Çamdevyren, H ;
Demyr, N ;
Kanik, A ;
Keskyn, S .
ECOLOGICAL MODELLING, 2005, 181 (04) :581-589
[9]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[10]   COMPARING PREDICTIVE ACCURACY [J].
DIEBOLD, FX ;
MARIANO, RS .
JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 1995, 13 (03) :253-263