Comparing the performance of 10 machine learning models in predicting Chlorophyll a in western Lake Erie

被引:0
|
作者
Song, Yang [1 ]
Shen, Chunqi [2 ]
Hong, Yi [1 ]
机构
[1] Univ Michigan, Cooperat Inst Great Lakes Res, Sch Environm & Sustainabil, Ann Arbor, MI 48109 USA
[2] Yale Univ, Yale Sch Environm, New Haven, CT 06511 USA
关键词
Phytoplankton; Eutrophication; Machine learning; Model comparison; Preprocessing; Key predictor; ALGAL BLOOMS; EUTROPHICATION;
D O I
10.1016/j.jenvman.2025.125007
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Algal blooms, which have substantial adverse effects, are increasingly occurring worldwide in the context of global warming and eutrophication. Machine learning models (MLMs) are emerging as efficient and promising tools for predicting algal blooms. However, the performance of MLMs in directly simulating algal blooms has seldom been reported, particularly in the world's largest freshwater system, the Great Lakes. To address this gap, we compared the prediction performance of Chlorophyll a (Chl a, a proxy for algal biomass) concentration in western Lake Erie among 10 popular MLMs using 15 measured water quality data collected from 2012 to 2022. Results have shown that outlier removal is essential, as it can noticeably improve prediction accuracy such as increasing the coefficient of determination (R2) from 0.35 to 0.84 (140 %) for the optimal Gradient Boosting Decision Trees (GBDT) model. All 32,767 feature combinations of measured water quality parameters were exhaustively tested for each MLM and the best feature combinations are identified. MLMs benefit from this feature selection, with the Polynomial Regression model showing notable improvements: the R2 increased from 0.71 to 0.82 (15 %) compared to no feature selection. The tree-based ensemble models, including the GBDT (R2 = 0.84) and Random Forest (R2 = 0.82) models, show the top two performances in predicting Chl a. Based on feature importance analysis, particulate organic nitrogen (PON) is determined to be the most critical water quality parameter for predicting Chl a. These results establish a benchmark for the performance of common MLMs in predicting Chl a in western Lake Erie. The determined best feature combinations potentially make water quality observations more effective and targeted, thereby benefiting sustainable water quality management.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A Predictive Model of Chlorophyll a in Western Lake Erie Based on Artificial Neural Network
    Wang, Qi
    Wang, Song
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [2] Modeling chlorophyll-a in Taihu Lake with machine learning models
    Liu Jianping
    Zhang Yuchao
    Qian Xin
    2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 5398 - 5403
  • [3] Comparing the Performance of 17 Machine Learning Models in Predicting Human Population Growth of Countries
    Otoom, Mohammad Mahmood
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (01): : 220 - 225
  • [4] Comparing discrete choice and machine learning models in predicting destination choice
    Rahnasto, Ilona
    Hollestelle, Martijn
    EUROPEAN TRANSPORT RESEARCH REVIEW, 2024, 16 (01)
  • [5] Predicting and analyzing the algal population dynamics of a grass-type lake with explainable machine learning
    Cui, Hao
    Tao, Yiwen
    Li, Jian
    Zhang, Jinhui
    Xiao, Hui
    Milne, Russell
    JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2024, 354
  • [6] Comparing the performance of machine learning and conventional models for predicting atherosclerotic cardiovascular disease in a general Chinese population
    Fan, Zihao
    Du, Zhi
    Fu, Jinrong
    Zhou, Ying
    Zhang, Pengyu
    Shi, Chuning
    Sun, Yingxian
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [7] Comparing the performance of machine learning and conventional models for predicting atherosclerotic cardiovascular disease in a general Chinese population
    Zihao Fan
    Zhi Du
    Jinrong Fu
    Ying Zhou
    Pengyu Zhang
    Chuning Shi
    Yingxian Sun
    BMC Medical Informatics and Decision Making, 23
  • [8] Extending the forecast model: Predicting Western Lake Erie harmful algal blooms at multiple spatial scales
    Manning, Nathan F.
    Wang, Yu-Chen
    Long, Colleen M.
    Bertani, Isabella
    Sayers, Michael J.
    Bosse, Karl R.
    Shuchman, Robert A.
    Scavia, Donald
    JOURNAL OF GREAT LAKES RESEARCH, 2019, 45 (03) : 587 - 595
  • [9] Prediction of chlorophyll a and risk assessment of water blooms in Poyang Lake based on a machine learning method
    Huang, Huadong
    Zhang, Jing
    ENVIRONMENTAL POLLUTION, 2024, 347
  • [10] Machine Learning-Based Water Level Prediction in Lake Erie
    Wang, Qi
    Wang, Song
    WATER, 2020, 12 (10) : 1 - 14