Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage

被引:0
作者
Taskeen Hasrod
Yannick B. Nuapia
Hlanganani Tutu
机构
[1] University of the Witwatersrand,Molecular Sciences Institute, School of Chemistry
[2] University of Limpopo,Pharmacy Department, School of Healthcare Sciences
[3] Turfloop Campus,undefined
来源
Environmental Monitoring and Assessment | 2024年 / 196卷
关键词
Acid Mine Drainage; Sulphate; Machine learning; Regression; Stacking ensemble machine learning; Environmental chemistry;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning was used to provide data for further evaluation of potential extraction of octathiocane (S8), a commercially useful by-product, from Acid Mine Drainage (AMD) by predicting sulphate levels in an AMD water quality dataset. Individual ML regressor models, namely: Linear Regression (LR), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge (RD), Elastic Net (EN), K-Nearest Neighbours (KNN), Support Vector Regression (SVR), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multi-Layer Perceptron Artificial Neural Network (MLP) and Stacking Ensemble (SE-ML) combinations of these models were successfully used to predict sulphate levels. A SE-ML regressor trained on untreated AMD which stacked seven of the best-performing individual models and fed them to a LR meta-learner model was found to be the best-performing model with a Mean Squared Error (MSE) of 0.000011, Mean Absolute Error (MAE) of 0.002617 and R2 of 0.9997. Temperature (°C), Total Dissolved Solids (mg/L) and, importantly, iron (mg/L) were highly correlated to sulphate (mg/L) with iron showing a strong positive linear correlation that indicated dissolved products from pyrite oxidation. Ensemble learning (bagging, boosting and stacking) outperformed individual methods due to their combined predictive accuracies. Surprisingly, when comparing SE-ML that combined all models with SE-ML that combined only the best-performing models, there was only a slight difference in model accuracies which indicated that including bad-performing models in the stack had no adverse effect on its predictive performance.
引用
收藏
相关论文
共 167 条
  • [31] Chan EB(2007)Statistical validation of sulfate quantification methods used for analysis of acid mine drainage Talanta 71 303-87508
  • [32] Ferreira B(2011)Development and validation of a spectrophotometric method to measure sulfate concentrations in mine water without interference Mine Water and the Environment 30 169-1152
  • [33] Iten M(2020)Circular economy model framework in the European water and wastewater sector Journal of Material Cycles and Waste Management 22 682-541
  • [34] Silva RG(1996)Mechanism for the formation of elemental sulfur from aqueous sulfide in chemical and microbiological desulfurization processes Industrial & Engineering Chemistry Research 35 1417-12754
  • [35] Flores H(2021)Predicting the concentration of sulfate (SO42-) in drinking water using artificial neural networks: A case study: Médéa-Algeria Desalination And Water Treatment 217 181-undefined
  • [36] Lorenz S(2022)Predicting the concentration of sulfate using machine learning methods Earth Science Informatics 15 1023-undefined
  • [37] Jackisch R(2023)Mixed coagulant-flocculant optimization for pharmaceutical effluent pretreatment using response surface methodology and Gaussian process regression Process Safety and Environmental Protection 169 909-undefined
  • [38] Tusa L(2022)Machine learning-based prediction of toxic metals concentration in an acid mine drainage environment, northern Tunisia Environmental Science and Pollution Research 29 87490-undefined
  • [39] Cecilia Contreras I(2008)Occurrence, properties and pollution potential of environmental minerals in acid mine drainage Science of the Total Environment, the 407 1135-undefined
  • [40] Zimmermann R(2022)Machine learning and materials informatics approaches in the analysis of physical properties of carbon nanotubes: A review Computational Materials Science 201 110939-undefined