Predicting water quality variables using gradient boosting machine: global versus local explainability using SHapley Additive Explanations (SHAP)

被引:0
|
作者
Merabet, Khaled [1 ]
Di Nunno, Fabio [2 ]
Granata, Francesco [2 ]
Kim, Sungwon [3 ]
Adnan, Rana Muhammad [4 ,7 ]
Heddam, Salim [1 ]
Kisi, Ozgur [5 ,8 ]
Zounemat-Kermani, Mohammad [6 ]
机构
[1] Univ 20 Aout 1955, Fac Sci, Agron Dept, Hydraul Div, Route El Hadaik,BP 26, Skikda, Algeria
[2] Univ Cassino & Southern Lazio, Dept Civil & Mech Engn DICEM, Via Biasio, 43, I-03043 Cassino, Frosinone, Italy
[3] Dongyang Univ, Dept Railroad Construct & Safety Engn, Yeongju 36040, South Korea
[4] Guangzhou Univ, Coll Architecture & Urban Planning, Guangzhou 510006, Peoples R China
[5] IIia State Univ, Sch Technol, Dept Civil Engn, Tbilisi 0179, Georgia
[6] Shahid Bahonar Univ Kerman, Dept Civil Engn, Kerman, Iran
[7] Saveetha Inst Med & Tech Sci, Ctr global Hlth Res, Chennai 600001, India
[8] Korea Univ, Sch Civil Environm & Architectural Engn, Seoul 02841, South Korea
关键词
Modelling; Water quality; Chl-a; DO; TU; AdaBoost; Boosting models; SHAP; SHORT-TERM-MEMORY; DISSOLVED-OXYGEN; LEARNING-MODEL; XGBOOST; RIVER; FRAMEWORK;
D O I
10.1007/s12145-025-01796-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Water quality assessment is critical for ensuring the health of aquatic ecosystems and managing water resources effectively. However, accurately predicting key water quality variables remains challenging due to the complex interactions between environmental factors and anthropogenic influences. In the present investigation, a new modelling framework is proposed for better prediction of three water quality variables, namely: (i) dissolved oxygen concentration (DO), (ii) water turbidity (TU), and (iii) water Chlorophyll a (Chl-a). Six machine learning models, i.e., adaptive boosting (AdaBoost), categorical boosting (CatBoost), histogram gradient boosting (HistGBRT), light gradient boosting machine (LightGBM), natural gradient boosting (NGBoost), and extreme gradient boosting (XGBoost), both applied and compared based on the combination of a large number of water quality variables. All models were developed using data collected from three stations: (i) USGS 05543010 Illinois River at Seneca, Illinois County, (ii) USGS 05586300 Illinois River at Florence, Illinois County, and (iii) USGS 05553700 Illinois River at Starved Rock, Illinois County, USA. The SHapley additive explanations (SHAP) was adopted in the present study for model interpretability and feature ranking. Furthermore, all models were compared using various numerical indices and graphical representations. From the obtained results we can draw the following conclusion. DO concentration can be predicted very well with high numerical performances, and the CatBoost model was found to be the best one exhibiting excellent numerical index: RMSE (0.430), MAE (0.326), R (0.980) and NSE (0.961), respectively. For Chl-a, all models were found to be less accurate and the best performances were obtained using the LightGBM with RMSE (5.916), MAE (4.294), R (0.892) and NSE (0.795), respectively. Finally, for water TU, none of the models were found to be accurate and very poor performances were obtained. Finally, the use of the SHAP has significantly helped in better understanding the overall contribution of the various water variables in the finale prediction.
引用
收藏
页数:34
相关论文
共 50 条
  • [41] Diabetes prediction using Shapley additive explanations and DSaaS over machine learning classifiers: a novel healthcare paradigm
    Pratiyush Guleria
    Parvathaneni Naga Srinivasu
    M. Hassaballah
    Multimedia Tools and Applications, 2024, 83 : 40677 - 40712
  • [42] Credit risk assessment of automobile loans using machine learning-based SHapley Additive exPlanations approach
    Lin, Shuoyan
    Song, Dandan
    Cao, Boyi
    Gu, Xin
    Li, Jiazhan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
  • [43] Extreme gradient boosting model to assess risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual prediction using SHapley Additive exPlanations
    Zou, Ying
    Shi, Yan
    Sun, Fang
    Liu, Jihua
    Guo, Yu
    Zhang, Huanlei
    Lu, Xiudi
    Gong, Yan
    Xia, Shuang
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 225
  • [44] Interpretation of Machine-Learning-Based (Black-box) Wind Pressure Predictions for Low-Rise Gable-Roofed Buildings Using Shapley Additive Explanations (SHAP)
    Meddage, Pasindu
    Ekanayake, Imesh
    Perera, Udara Sachinthana
    Azamathulla, Hazi Md
    Said, Md Azlin Md
    Rathnayake, Upaka
    BUILDINGS, 2022, 12 (06)
  • [45] Enhancing trust and interpretability of complex machine learning models using local interpretable model agnostic shap explanations
    Parisineni, Sai Ram Aditya
    Pal, Mayukha
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (04) : 457 - 466
  • [46] Breast cancer molecular subtype prediction: Improving interpretability of complex machine-learning models based on multiparametric-MRI features using SHapley Additive exPlanations (SHAP) methodology
    Crombe, Amandine
    Kataoka, Masako
    DIAGNOSTIC AND INTERVENTIONAL IMAGING, 2024, 105 (05) : 161 - 162
  • [47] Assessing the Suitability of Boosting Machine-Learning Algorithms for Classifying Arsenic-Contaminated Waters: A Novel Model-Explainable Approach Using SHapley Additive exPlanations
    Ibrahim, Bemah
    Ewusi, Anthony
    Ahenkorah, Isaac
    WATER, 2022, 14 (21)
  • [48] Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach
    Miranda, Eka
    Adiarto, Suko
    Bhatti, Faqir M.
    Zakiyyah, Alfi Yusrotis
    Aryuni, Mediana
    Bernando, Charles
    HEALTHCARE INFORMATICS RESEARCH, 2023, 29 (03) : 228 - 238
  • [49] A data-driven approach to predict the compressive strength of alkali-activated materials and correlation of influencing parameters using SHapley Additive exPlanations (SHAP) analysis
    Zheng, Xinliang
    Xie, Yi
    Yang, Xujiao
    Amin, Muhammad Nasir
    Nazar, Sohaib
    Khan, Suleman Ayub
    Althoey, Fadi
    Deifalla, Ahmed Farouk
    JOURNAL OF MATERIALS RESEARCH AND TECHNOLOGY-JMR&T, 2023, 25 : 4074 - 4093
  • [50] Understanding the relationship between rural morphology and photovoltaic (PV) potential in traditional and non-traditional building clusters using shapley additive exPlanations (SHAP) values
    Liu, Jiang
    Peng, Changhai
    Zhang, Junxue
    APPLIED ENERGY, 2025, 380