Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins

被引:18
作者
Xu, Yuanhao [1 ]
Lin, Kairong [1 ]
Hu, Caihong [2 ]
Wang, Shuli [3 ]
Wu, Qiang [2 ]
Zhang, Jingwen [1 ]
Xiao, Mingzhong [1 ]
Luo, Yufu [1 ]
机构
[1] Sun Yat Sen Univ, Sch Civil Engn, State Key Lab Tunnel Engn, Guangzhou 510275, Peoples R China
[2] Zhengzhou Univ, Sch Water Conservancy Sci & Engn, Zhengzhou 450000, Peoples R China
[3] Changan Univ, Sch Water & Environm, Xian 710061, Peoples R China
基金
中国国家自然科学基金;
关键词
Prediction in ungauged basins; Interpretable machine learning; XGBoost; Shapely additive explanation; Rainfall-runoff; HYDROMETEOROLOGICAL TIME-SERIES; HYDROLOGICAL MODEL PARAMETERS; LANDSCAPE ATTRIBUTES; REGIONALIZATION METHODS; STREAMFLOW ESTIMATION; CATCHMENT ATTRIBUTES; GLOBAL OPTIMIZATION; PREDICTION; CALIBRATION; METEOROLOGY;
D O I
10.1016/j.jhydrol.2024.131598
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing NashSutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.
引用
收藏
页数:13
相关论文
共 98 条
[1]   Explaining individual predictions when features are dependent: More accurate approximations to Shapley values [J].
Aas, Kjersti ;
Jullum, Martin ;
Loland, Anders .
ARTIFICIAL INTELLIGENCE, 2021, 298
[2]   A Ranking of Hydrological Signatures Based on Their Predictability in Space [J].
Addor, N. ;
Nearing, G. ;
Prieto, Cristina ;
Newman, A. J. ;
Le Vine, N. ;
Clark, M. P. .
WATER RESOURCES RESEARCH, 2018, 54 (11) :8792-8812
[3]   The CAMELS data set: catchment attributes and meteorology for large-sample studies [J].
Addor, Nans ;
Newman, Andrew J. ;
Mizukami, Naoki ;
Clark, Martyn P. .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2017, 21 (10) :5293-5313
[4]   Large-sample assessment of varying spatial resolution on the streamflow estimates of the wflow_sbm hydrological model [J].
Aerts, Jerom P. M. ;
Hut, Rolf W. ;
van de Giesen, Nick C. ;
Drost, Niels ;
van Verseveld, Willem J. ;
Weerts, Albrecht H. ;
Hazenberg, Pieter .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2022, 26 (16) :4407-4430
[5]  
Ajami NK, 2004, J HYDROL, V298, P112, DOI [10.1016/j.hydrol.2004.03.033, 10.1016/j.jhydrol.2004.03.033]
[6]   The CAMELS-CL dataset: catchment attributes and meteorology for large sample studies - Chile dataset [J].
Alvarez-Garreton, Camila ;
Mendoza, Pablo A. ;
Pablo Boisier, Juan ;
Addor, Nans ;
Galleguillos, Mauricio ;
Zambrano-Bigiarini, Mauricio ;
Lara, Antonio ;
Puelma, Cristobal ;
Cortes, Gonzalo ;
Garreaud, Rene ;
McPhee, James ;
Ayala, Alvaro .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2018, 22 (11) :5817-5846
[7]   A comprehensive, multisource database for hydrometeorological modeling of 14,425 North American watersheds [J].
Arsenault, Richard ;
Brissette, Francois ;
Martel, Jean-Luc ;
Troin, Magali ;
Levesque, Guillaume ;
Davidson-Chaput, Jonathan ;
Gonzalez, Mariana Castaneda ;
Ameli, Ali ;
Poulin, Annie .
SCIENTIFIC DATA, 2020, 7 (01)
[8]   Calibration of hydrological model parameters for ungauged catchments [J].
Bardossy, A. .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2007, 11 (02) :703-710
[9]   High-resolution (1 km) Köppen-Geiger maps for 1901-2099 based on constrained CMIP6 projections [J].
Beck, Hylke E. ;
Mcvicar, Tim R. ;
Vergopolan, Noemi ;
Berg, Alexis ;
Lutsko, Nicholas J. ;
Dufour, Ambroise ;
Zeng, Zhenzhong ;
Jiang, Xin ;
van Dijk, Albert I. J. M. ;
Miralles, Diego G. .
SCIENTIFIC DATA, 2023, 10 (01)
[10]   Global patterns in base flow index and recession based on streamflow observations from 3394 catchments [J].
Beck, Hylke E. ;
van Dijk, Albert I. J. M. ;
Miralles, Diego G. ;
de Jeu, Richard A. M. ;
Bruijnzeel, L. A. ;
McVicar, Tim R. ;
Schellekens, Jaap .
WATER RESOURCES RESEARCH, 2013, 49 (12) :7843-7863