Design flood estimation using extreme Gradient Boosting-based on Bayesian optimization

被引:19
作者
Jarajapu, Deva Charan [1 ]
Rathinasamy, Maheswaran [2 ]
Agarwal, Ankit [3 ]
Bronstert, Axel [1 ]
机构
[1] Univ Potsdam, Inst Environm Sci & Geog, D-14476 Potsdam, Germany
[2] Indian Inst Technol Hyderabad, Dept Civil Engn, Kandi 502285, Telangana, India
[3] Indian Inst Technol Roorkee, Dept Hydrol, Roorkee 247667, Uttaranchal, India
关键词
Regional flood frequency analysis; XGB; Ungauged catchments; CAMELS dataset; ARTIFICIAL NEURAL-NETWORK; MULTIPLE LINEAR-REGRESSION; FREQUENCY-ANALYSIS; UNGAUGED SITES; QUANTILE REGRESSION; CROSS-VALIDATION; MODEL; PRECIPITATION; TEMPERATURE; MACHINE;
D O I
10.1016/j.jhydrol.2022.128341
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Regional Flood Frequency Analysis (RFFA) is one of the widely used approaches for estimating design floods in the ungauged basins. We developed an eXtreme Gradient Boost (XGB) machine learning model for RFFA and flood estimation. Our approach relies on developing a regression model between flood quantiles and the commonly available catchment descriptors. We used CAMELs data for 671 catchments from the USA to test the approach's efficacy. The results were compared with the traditional Multiple Linear Regression methods and Artificial Neural Networks. Results revealed that the XGB-based approach estimated design flood with the highest accuracy during training and validation with minor mean absolute error, root mean square error values, and percentage bias ranging from -10 to + 10. The importance of each catchment feature is visualized by three different approaches Gini Impurity, Permutation, and Dropout Loss Feature Ranking. We observed that the most dominating variables are rainfall intensity, slope, snow fraction, soil porosity, and temperature. It is observed that the importance of these variables is a function of the hydroclimatic regions and varies with space. In contrast, mean annual areal potential evapotranspiration, mean annual rainfall, fraction forest area, and soil conductivity have low significance in estimating design flood for an ungauged catchment. Indeed, the proposed XGB-based approach has broader applicability and replicability.
引用
收藏
页数:16
相关论文
共 86 条
[1]   The CAMELS data set: catchment attributes and meteorology for large-sample studies [J].
Addor, Nans ;
Newman, Andrew J. ;
Mizukami, Naoki ;
Clark, Martyn P. .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2017, 21 (10) :5293-5313
[2]   Hydrologic regionalization using wavelet-based multiscale entropy method [J].
Agarwal, A. ;
Maheswaran, R. ;
Sehgal, V. ;
Khosa, R. ;
Sivakumar, B. ;
Bernhofer, C. .
JOURNAL OF HYDROLOGY, 2016, 538 :22-32
[3]   Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research [J].
Agatonovic-Kustrin, S ;
Beresford, R .
JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2000, 22 (05) :717-727
[4]   Regional flood frequency analysis using spatial proximity and basin characteristics: Quantile regression vs. parameter regression technique [J].
Ahn, Kuk-Hyun ;
Palmer, Richard .
JOURNAL OF HYDROLOGY, 2016, 540 :515-526
[5]   Permutation importance: a corrected feature importance measure [J].
Altmann, Andre ;
Tolosi, Laura ;
Sander, Oliver ;
Lengauer, Thomas .
BIOINFORMATICS, 2010, 26 (10) :1340-1347
[6]   Artificial neural networks in medical diagnosis [J].
Amato, Filippo ;
Lopez, Alberto ;
Pena-Mendez, Eladia Maria ;
Vanhara, Petr ;
Hampl, Ales ;
Havel, Josef .
JOURNAL OF APPLIED BIOMEDICINE, 2013, 11 (02) :47-58
[7]   Short term load forecasting using multiple linear regression [J].
Amral, N. ;
Oezveren, C. S. ;
King, D. .
2007 42ND INTERNATIONAL UNIVERSITIES POWER ENGINEERING CONFERENCE, VOLS 1-3, 2007, :1192-1198
[8]   A TEST OF GOODNESS OF FIT [J].
ANDERSON, TW ;
DARLING, DA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1954, 49 (268) :765-769
[9]  
[Anonymous], 1960, FLOOD FREQUENCY ANAL
[10]  
[Anonymous], 2011, International Journal of Engineering Science and Technology