Harnessing ensemble Machine learning models for improved salinity prediction in large river basin scales

被引:0
作者
Mahmoud, Mohamed F. [1 ]
Arabi, Mazdak [1 ]
Pallickara, Shrideep [2 ]
机构
[1] Colorado State Univ, Dept Civil & Environm Engn, 1372 Campus Delivery, Ft Collins, CO 80523 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO USA
基金
美国国家科学基金会;
关键词
Machine learning; Bayesian model averaging; Spatial prediction; Stacked ensembles; XGBoost; Colorado River Basin; Salinity prediction; NEURAL-NETWORKS; COLORADO RIVER; REGRESSION; CLASSIFICATION; TREES;
D O I
10.1016/j.jhydrol.2025.132691
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
This study develops a robust ensemble machine learning methodology for predicting average annual salinity by combining multiple machine learning algorithms. Salt concentration is a crucial water quality indicator, and salinity issues cost $300 million annually in the U.S. Irrigated agricultural lands in the Upper Colorado River Basin contribute excessively to dissolved solid loads despite covering less than 2% of the basin area. The economic impact and complex relationship between irrigation practices, groundwater dynamics, and salinity levels necessitate improved predictive capabilities at river basin scales. Using twenty years of data from 150 watersheds, eleven machine learning algorithms were evaluated through both random and spatial cross-validation approaches, with Extreme Gradient Boosting, Gradient Boosting, and Random Forest emerging as top performers. Bayesian Model Averaging and stacked generalization were employed to create ensemble models, demonstrating enhanced performance validity. The BMA ensemble achieved better spatial generalization compared to individual models while requiring significantly less computational resources than stacking. Model uncertainty analysis revealed that BMA provided the most stable predictions among all approaches. Soil electrical conductivity and calcium carbonate content emerged as the most important predictors, followed by river flow. The resulting spatially distributed predictions revealed distinct patterns in sulfate loads and concentrations across sub-basins, providing insights for targeted salinity management. This study demonstrates the effectiveness of ensemble machine learning approaches for robust salinity prediction while highlighting the importance of comprehensive uncertainty assessment and spatial validation in environmental modeling applications.
引用
收藏
页数:15
相关论文
共 77 条
  • [1] Akinwande O, 2015, OPEN J STAT, V5, P754, DOI DOI 10.4236/OJS.2015.57075
  • [2] [Anonymous], 2014, Discovering Knowledge in Data: An Introduction to Data Mining, P149
  • [3] [Anonymous], 2022, Soil Survey Geographic (SSURGO) Database
  • [4] [Anonymous], 2011, Water Quality Standards for Salinity Colorado River System
  • [5] Berrar D., 2018, ENCY BIOINFORMATICS, V1-3, P542, DOI DOI 10.1016/B978-0-12-809633-8.20349-X
  • [6] Bhanja S., 2018, Impact of Data Normalization on Deep Neural Network for Time Series Forecasting
  • [7] BURNAEV E., 2014, P 27 C LEARN THEOR, P605
  • [8] Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest
    Cai, Jianchao
    Xu, Kai
    Zhu, Yanhui
    Hu, Fang
    Li, Liuhuan
    [J]. APPLIED ENERGY, 2020, 262
  • [9] Canziani A., 2016, arXiv, DOI DOI 10.48550/ARXIV.1605.07678
  • [10] Application of stacking ensemble learning model in quantitative analysis of biomaterial activity
    Cao, Hao
    Gu, Youlin
    Fang, Jiajie
    Hu, Yihua
    Ding, Wanying
    He, Haihao
    Chen, Guolong
    [J]. MICROCHEMICAL JOURNAL, 2022, 183