Integration of Multivariate Adaptive Regression Splines and Weighted Arithmetic Water Quality Index Methods for Drinking Water Quality Analysis

被引:4
|
作者
Jumber, Marshet B. [1 ]
Damtie, Menwagaw T. [1 ]
Tegegne, Desalegn [2 ]
机构
[1] Debre Tabor Univ, Hydraul & Water Resources Engn, Debre Tabor 272, Ethiopia
[2] Int Water Management Inst IWMI, Addis Ababa, Ethiopia
关键词
Water quality index; Machine learning; R programming; Weighted arithmetic WQI; PREDICTION;
D O I
10.1007/s41101-024-00239-x
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The water quality index (WQI) is a widely used tool for assessing water quality of various water bodies, but it has drawn criticism for being transferable globally and taking a more physical approach. The present study aimed to assess the water quality indices using weighted arithmetic method and investigate alternative approach to improve the prediction accuracy of WQI by applying machine learning algorithms. These include artificial neural networks, decision trees, random forest, gradient boosting machine, multivariate adaptive regression splines (MARS), Gaussian process with radial basic function, support vector machine with radial basic function, a hybrid of Bayesian and ridge regression, and K-nearest neighbor; the spatiotemporal assessment of the WQI revealed a considerable fluctuation that requires further research to determine the potential causes. The water quality dataset was split into training (70%) and testing (30%) datasets, and the tenfold cross-validation technique was utilized to compare models and optimize hyperparameters on various subsets of the dataset. The study result revealed that almost all of the deployed machine learning models performed well on the training dataset. The multivariate adaptive regression spline (MARS) model outperformed others during both the training and testing phases (RMSE = 0.044, R2 = 0.89, and MAE = 0.025; RMSE = 0.090, R2 = 0.87, and MAE = 0.061 respectively), with the normalized dataset. The worst prediction performance in the test dataset was attained by kernel-based models such as the Gaussian process and support vector machine, which was possibly the effect of overfitting during the model-building process. A MARS model equation, employing three strongly impacting water quality parameters, including E. coli, free residual chlorine, and turbidity, was finally suggested to predict the water quality index for drinking purposes.
引用
收藏
页数:11
相关论文
共 50 条