Interpretation of ensemble learning to predict water quality using explainable artificial intelligence

被引:104
作者
Park, Jungsu [1 ]
Lee, Woo Hyoung [2 ]
Kim, Keug Tae [3 ]
Park, Cheol Young [4 ]
Lee, Sanghun [5 ]
Heo, Tae-Young [5 ]
机构
[1] Hanbat Natl Univ, Dept Civil & Environm Engn, 125 Dongseo Daero, Daejeon 34158, South Korea
[2] Univ Cent Florida, Dept Civil Environm & Construct Engn, 12800 Pegasus Dr, Orlando, FL 32816 USA
[3] Univ Suwon, Dept Environm & Energy Engn, 17 Wauan Gil, Hwaseong Si 18323, Gyeonggi Do, South Korea
[4] BAIES, Bayesian AI Lab, Fairfax, VA 22030 USA
[5] Chungbuk Natl Univ, Dept Informat & Stat, Chungdae Ro 1, Cheongju 28644, Chungbuk, South Korea
基金
新加坡国家研究基金会;
关键词
Algal management; Ensemble model; Machine learning; Water quality; XGBoost; NEURAL-NETWORKS; BLACK-BOX; BIOMASS; MODEL; MICROCYSTIS; INFORMATION; LAKE;
D O I
10.1016/j.scitotenv.2022.155070
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Algal bloom is a significant issue when managing water quality in freshwater; specifically, predicting the concentration of algae is essential to maintaining the safety of the drinking water supply system. The chlorophyll-a (Chl-a) concentration is a commonly used indicator to obtain an estimation of algal concentration. In this study, an XGBoost ensemble machine learning (ML) model was developed from eighteen input variables to predict Chl-a concentration. The composition and pretreatment of input variables to the model are important factors for improving model performance. Explainable artificial intelligence (XAI) is an emerging area of ML modeling that provides a reasonable interpretation of model performance. The effect of input variable selection on model performance was estimated, where the priority of input variable selection was determined using three indices: Shapley value (SHAP), feature importance (FI), and variance inflation factor (VIF). SHAP analysis is an XAI algorithm designed to compute the relative importance of input variables with consistency, providing an interpretable analysis for model prediction. The XGB models simulated with independent variables selected using three indices were evaluated with root mean square error (RMSE), RMSEobservation standard deviation ratio, and Nash-Sutcliffe efficiency. This study shows that the model exhibited the most stable performance when the priority of input variables was determined by SHAP. This implies that on-site monitoring can be designed to collect the selected input variables from the SHAP analysis to reduce the cost of overall water quality analysis. The independent variables were further analyzed using SHAP summary plot, force plot, target plot, and partial dependency plot to provide understandable interpretation on the performance of the XGB model. While XAI is still in the early stages of development, this study successfully demonstrated a good example of XAI application to improve the interpretation of machine learning model performance in predicting water quality.
引用
收藏
页数:12
相关论文
共 49 条
  • [1] Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
    Adadi, Amina
    Berrada, Mohammed
    [J]. IEEE ACCESS, 2018, 6 : 52138 - 52160
  • [2] [Anonymous], RTWIS REAL TIME WATE
  • [3] [Anonymous], Xgboost
  • [4] [Anonymous], PDPBOX
  • [5] [Anonymous], 2016, KDD16 P 22 ACM, DOI DOI 10.1145/2939672.2939785
  • [6] Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
    Barredo Arrieta, Alejandro
    Diaz-Rodriguez, Natalia
    Del Ser, Javier
    Bennetot, Adrien
    Tabik, Siham
    Barbado, Alberto
    Garcia, Salvador
    Gil-Lopez, Sergio
    Molina, Daniel
    Benjamins, Richard
    Chatila, Raja
    Herrera, Francisco
    [J]. INFORMATION FUSION, 2020, 58 : 82 - 115
  • [7] Characterising performance of environmental models
    Bennett, Neil D.
    Croke, Barry F. W.
    Guariso, Giorgio
    Guillaume, Joseph H. A.
    Hamilton, Serena H.
    Jakeman, Anthony J.
    Marsili-Libelli, Stefano
    Newham, Lachlan T. H.
    Norton, John P.
    Perrin, Charles
    Pierce, Suzanne A.
    Robson, Barbara
    Seppelt, Ralf
    Voinov, Alexey A.
    Fath, Brian D.
    Andreassian, Vazken
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2013, 40 : 1 - 20
  • [8] Phytoplankton bloom status: Chlorophyll a biomass as an indicator of water quality condition in the southern estuaries of Florida, USA
    Boyer, Joseph N.
    Kelble, Christopher R.
    Ortner, Peter B.
    Rudnick, David T.
    [J]. ECOLOGICAL INDICATORS, 2009, 9 : S56 - S67
  • [9] Diel migration of Microcystis during an algal bloom event in the Three Gorges Reservoir, China
    Cui, Yu-Jie
    Liu, De-Fu
    Zhang, Jia-lei
    Yang, Zheng-Jian
    Khu, Soon-Thiam
    Ji, Dao-Bin
    Song, Lin-Xu
    Long, Liang-Hong
    [J]. ENVIRONMENTAL EARTH SCIENCES, 2016, 75 (07)
  • [10] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232