Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage

被引:0
作者
Taskeen Hasrod
Yannick B. Nuapia
Hlanganani Tutu
机构
[1] University of the Witwatersrand,Molecular Sciences Institute, School of Chemistry
[2] University of Limpopo,Pharmacy Department, School of Healthcare Sciences
[3] Turfloop Campus,undefined
来源
Environmental Monitoring and Assessment | 2024年 / 196卷
关键词
Acid Mine Drainage; Sulphate; Machine learning; Regression; Stacking ensemble machine learning; Environmental chemistry;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning was used to provide data for further evaluation of potential extraction of octathiocane (S8), a commercially useful by-product, from Acid Mine Drainage (AMD) by predicting sulphate levels in an AMD water quality dataset. Individual ML regressor models, namely: Linear Regression (LR), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge (RD), Elastic Net (EN), K-Nearest Neighbours (KNN), Support Vector Regression (SVR), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multi-Layer Perceptron Artificial Neural Network (MLP) and Stacking Ensemble (SE-ML) combinations of these models were successfully used to predict sulphate levels. A SE-ML regressor trained on untreated AMD which stacked seven of the best-performing individual models and fed them to a LR meta-learner model was found to be the best-performing model with a Mean Squared Error (MSE) of 0.000011, Mean Absolute Error (MAE) of 0.002617 and R2 of 0.9997. Temperature (°C), Total Dissolved Solids (mg/L) and, importantly, iron (mg/L) were highly correlated to sulphate (mg/L) with iron showing a strong positive linear correlation that indicated dissolved products from pyrite oxidation. Ensemble learning (bagging, boosting and stacking) outperformed individual methods due to their combined predictive accuracies. Surprisingly, when comparing SE-ML that combined all models with SE-ML that combined only the best-performing models, there was only a slight difference in model accuracies which indicated that including bad-performing models in the stack had no adverse effect on its predictive performance.
引用
收藏
相关论文
共 167 条
  • [1] Alzubi J(2018)Machine learning from theory to algorithms: An overview Journal of Physics: Conference Series 1142 012012-4182
  • [2] Nayyar A(2013)Predicting copper concentrations in acid mine drainage: A comparative analysis of five machine learning techniques Environmental Monitoring and Assessment 185 4171-190
  • [3] Kumar A(2014)Uncertainty quantification and integration of machine learning techniques for predicting acid rock drainage chemistry: A probability bounds approach Science of the Total Environment 490 182-352
  • [4] Betrie GD(2018)Soil water content estimated by support vector machine for the assessment of shallow landslides triggering: The role of antecedent meteorological conditions Environmental Modeling & Assessment 23 333-181
  • [5] Tesfamariam S(2010)Clay and non-clay minerals in the pharmaceutical and cosmetic industries Part II Active Ingredients. Applied Clay Science 47 171-1978
  • [6] Morin KA(2012)Influence of waterfall aeration and seasonal temperature variation on the iron and arsenic attenuation rates in an acid mine drainage system Applied Geochemistry 27 1966-25
  • [7] Sadiq R(2020)Forest inventory with high-density UAV-Lidar: Machine learning approaches for predicting individual tree attributes Computers and Electronics in Agriculture 179 105815-341
  • [8] Betrie GD(2021)A hybrid neural network–particle swarm optimization informed spatial interpolation technique for groundwater quality mapping in a small island province of the Philippines Toxics 9 273-49
  • [9] Sadiq R(2020)Monitoring sustainable development by means of earth observation data and machine learning: A review Environmental Sciences Europe 32 120-31
  • [10] Morin KA(2021)UAS-based hyperspectral environmental monitoring of acid mine drainage affected waters Minerals 11 1-17145