Predicting water quality variables using gradient boosting machine: global versus local explainability using SHapley Additive Explanations (SHAP)

被引:0
|
作者
Merabet, Khaled [1 ]
Di Nunno, Fabio [2 ]
Granata, Francesco [2 ]
Kim, Sungwon [3 ]
Adnan, Rana Muhammad [4 ,7 ]
Heddam, Salim [1 ]
Kisi, Ozgur [5 ,8 ]
Zounemat-Kermani, Mohammad [6 ]
机构
[1] Univ 20 Aout 1955, Fac Sci, Agron Dept, Hydraul Div, Route El Hadaik,BP 26, Skikda, Algeria
[2] Univ Cassino & Southern Lazio, Dept Civil & Mech Engn DICEM, Via Biasio, 43, I-03043 Cassino, Frosinone, Italy
[3] Dongyang Univ, Dept Railroad Construct & Safety Engn, Yeongju 36040, South Korea
[4] Guangzhou Univ, Coll Architecture & Urban Planning, Guangzhou 510006, Peoples R China
[5] IIia State Univ, Sch Technol, Dept Civil Engn, Tbilisi 0179, Georgia
[6] Shahid Bahonar Univ Kerman, Dept Civil Engn, Kerman, Iran
[7] Saveetha Inst Med & Tech Sci, Ctr global Hlth Res, Chennai 600001, India
[8] Korea Univ, Sch Civil Environm & Architectural Engn, Seoul 02841, South Korea
关键词
Modelling; Water quality; Chl-a; DO; TU; AdaBoost; Boosting models; SHAP; SHORT-TERM-MEMORY; DISSOLVED-OXYGEN; LEARNING-MODEL; XGBOOST; RIVER; FRAMEWORK;
D O I
10.1007/s12145-025-01796-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Water quality assessment is critical for ensuring the health of aquatic ecosystems and managing water resources effectively. However, accurately predicting key water quality variables remains challenging due to the complex interactions between environmental factors and anthropogenic influences. In the present investigation, a new modelling framework is proposed for better prediction of three water quality variables, namely: (i) dissolved oxygen concentration (DO), (ii) water turbidity (TU), and (iii) water Chlorophyll a (Chl-a). Six machine learning models, i.e., adaptive boosting (AdaBoost), categorical boosting (CatBoost), histogram gradient boosting (HistGBRT), light gradient boosting machine (LightGBM), natural gradient boosting (NGBoost), and extreme gradient boosting (XGBoost), both applied and compared based on the combination of a large number of water quality variables. All models were developed using data collected from three stations: (i) USGS 05543010 Illinois River at Seneca, Illinois County, (ii) USGS 05586300 Illinois River at Florence, Illinois County, and (iii) USGS 05553700 Illinois River at Starved Rock, Illinois County, USA. The SHapley additive explanations (SHAP) was adopted in the present study for model interpretability and feature ranking. Furthermore, all models were compared using various numerical indices and graphical representations. From the obtained results we can draw the following conclusion. DO concentration can be predicted very well with high numerical performances, and the CatBoost model was found to be the best one exhibiting excellent numerical index: RMSE (0.430), MAE (0.326), R (0.980) and NSE (0.961), respectively. For Chl-a, all models were found to be less accurate and the best performances were obtained using the LightGBM with RMSE (5.916), MAE (4.294), R (0.892) and NSE (0.795), respectively. Finally, for water TU, none of the models were found to be accurate and very poor performances were obtained. Finally, the use of the SHAP has significantly helped in better understanding the overall contribution of the various water variables in the finale prediction.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations
    Dong, Sheng
    Khattak, Afaq
    Ullah, Irfan
    Zhou, Jibiao
    Hussain, Arshad
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (05)
  • [22] An explainable predictive model for suicide attempt risk using an ensemble learning and Shapley Additive Explanations (SHAP) approach
    Nordin, Noratikah
    Zainol, Zurinahni
    Noor, Mohd Halim Mohd
    Chan, Lai Fong
    ASIAN JOURNAL OF PSYCHIATRY, 2023, 79
  • [23] A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP)
    Ekanayake, I. U.
    Meddage, D. P. P.
    Rathnayake, Upaka
    CASE STUDIES IN CONSTRUCTION MATERIALS, 2022, 16
  • [24] Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)
    Gebreyesus, Yibrah
    Dalton, Damian
    Nixon, Sebastian
    De Chiara, Davide
    Chinnici, Marta
    FUTURE INTERNET, 2023, 15 (03)
  • [25] Advancing water quality assessment and prediction using machine learning models, coupled with explainable artificial intelligence (XAI) techniques like shapley additive explanations (SHAP) for interpreting the black-box nature
    Makumbura, Randika K.
    Mampitiya, Lakindu
    Rathnayake, Namal
    Meddage, D. P. P.
    Henna, Shagufta
    Dang, Tuan Linh
    Hoshino, Yukinobu
    Rathnayake, Upaka
    RESULTS IN ENGINEERING, 2024, 23
  • [26] A novel framework for lung cancer classification using lightweight convolutional neural networks and ridge extreme learning machine model with SHapley Additive exPlanations (SHAP)
    Nahiduzzaman, Md.
    Abdulrazak, Lway Faisal
    Ayari, Mohamed Arselene
    Khandakar, Amith
    Islam, S. M. Riazul
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [27] Evaluating the Strength and Impact of Raw Ingredients of Cement Mortar Incorporating Waste Glass Powder Using Machine Learning and SHapley Additive ExPlanations (SHAP) Methods
    Alkadhim, Hassan Ali
    Amin, Muhammad Nasir
    Ahmad, Waqas
    Khan, Kaffayatullah
    Nazar, Sohaib
    Faraz, Muhammad Iftikhar
    Imran, Muhammad
    MATERIALS, 2022, 15 (20)
  • [28] Landslide Modeling in a Tropical Mountain Basin Using Machine Learning Algorithms and Shapley Additive Explanations
    Vega, Johnny
    Sepulveda-Murillo, Fabio Humberto
    Parra, Melissa
    AIR SOIL AND WATER RESEARCH, 2023, 16
  • [29] Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP)
    Tadese, Zinabu Bekele
    Hailu, Debela Tsegaye
    Abebe, Aschale Wubete
    Kebede, Shimels Derso
    Walle, Agmasie Damtew
    Seifu, Beminate Lemma
    Nimani, Teshome Demis
    DIGITAL HEALTH, 2024, 10
  • [30] Investigation of characteristic values in TDR waveform using SHapley Additive exPlanations (SHAP) for dielectric constant estimation during curing time
    Hong, Won-Taek
    Han, Woojin
    Byun, Yong-Hoon
    Yoon, Hyung-Koo
    SMART STRUCTURES AND SYSTEMS, 2024, 34 (01) : 25 - 32