Comparing the performance of 10 machine learning models in predicting Chlorophyll a in western Lake Erie

被引:0
|
作者
Song, Yang [1 ]
Shen, Chunqi [2 ]
Hong, Yi [1 ]
机构
[1] Univ Michigan, Cooperat Inst Great Lakes Res, Sch Environm & Sustainabil, Ann Arbor, MI 48109 USA
[2] Yale Univ, Yale Sch Environm, New Haven, CT 06511 USA
关键词
Phytoplankton; Eutrophication; Machine learning; Model comparison; Preprocessing; Key predictor; ALGAL BLOOMS; EUTROPHICATION;
D O I
10.1016/j.jenvman.2025.125007
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Algal blooms, which have substantial adverse effects, are increasingly occurring worldwide in the context of global warming and eutrophication. Machine learning models (MLMs) are emerging as efficient and promising tools for predicting algal blooms. However, the performance of MLMs in directly simulating algal blooms has seldom been reported, particularly in the world's largest freshwater system, the Great Lakes. To address this gap, we compared the prediction performance of Chlorophyll a (Chl a, a proxy for algal biomass) concentration in western Lake Erie among 10 popular MLMs using 15 measured water quality data collected from 2012 to 2022. Results have shown that outlier removal is essential, as it can noticeably improve prediction accuracy such as increasing the coefficient of determination (R2) from 0.35 to 0.84 (140 %) for the optimal Gradient Boosting Decision Trees (GBDT) model. All 32,767 feature combinations of measured water quality parameters were exhaustively tested for each MLM and the best feature combinations are identified. MLMs benefit from this feature selection, with the Polynomial Regression model showing notable improvements: the R2 increased from 0.71 to 0.82 (15 %) compared to no feature selection. The tree-based ensemble models, including the GBDT (R2 = 0.84) and Random Forest (R2 = 0.82) models, show the top two performances in predicting Chl a. Based on feature importance analysis, particulate organic nitrogen (PON) is determined to be the most critical water quality parameter for predicting Chl a. These results establish a benchmark for the performance of common MLMs in predicting Chl a in western Lake Erie. The determined best feature combinations potentially make water quality observations more effective and targeted, thereby benefiting sustainable water quality management.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Machine learning modeling of lake chlorophyll in a data-scarce region (Northern Patagonia, Chile): insights for environmental monitoring
    Caputo, Luciano
    Molina, Cristian Rios
    Ayllon-Arauco, Roxanna
    Benavides, Ivan Felipe
    INLAND WATERS, 2024, 14 (1-2) : 83 - 96
  • [22] Applying Machine Learning Models on Metrology Data for Predicting Device Electrical Performance
    Dey, Bappaditya
    Anh Tuan Ngo
    Sacchi, Sara
    Blanco, Victor
    Leray, Philippe
    Halder, Sandip
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT IV, 2025, 2136 : 435 - 453
  • [23] Comparing Performance of Different Machine Learning Methods for Predicting Severity of Construction Work Zone Crashes
    Mashhadi, Ali Hassandokht
    Rashidi, Abbas
    Medina, Juan
    Markovic, Nikola
    COMPUTING IN CIVIL ENGINEERING 2023-RESILIENCE, SAFETY, AND SUSTAINABILITY, 2024, : 434 - 442
  • [24] Machine Learning Models predicting Decompensation in Cirrhosis
    Mueller, Sophie Elisabeth
    Casper, Markus
    Ripoll, Cristina
    Zipprich, Alexander
    Horn, Paul
    Krawczyk, Marcin
    Lammert, Frank
    Reichert, Matthias Christian
    JOURNAL OF GASTROINTESTINAL AND LIVER DISEASES, 2025, 34 (01) : 71 - 80
  • [25] MACHINE LEARNING MODELS FOR PREDICTING SUCCESS OF STARTUPS
    Rodrigues, Fabiano
    Rodrigues, Francisco Aparecido
    Rocha Rodrigues, Thelma Valeria
    REVISTA DE GESTAO E PROJETOS, 2021, 12 (02): : 28 - 55
  • [26] Comparing Machine Learning Models and Statistical Models for Predicting Heart Failure Events: A Systematic Review and Meta-Analysis
    Sun, Zhoujian
    Dong, Wei
    Shi, Hanrui
    Ma, Hong
    Cheng, Lechao
    Huang, Zhengxing
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
  • [27] Using machine learning to reveal seasonal nutrient dynamics and their impact on chlorophyll-a levels in lake ecosystems: A focus on nitrogen and phosphorus
    Fang, Yong
    Huang, Ruting
    Shi, Xianyang
    ECOLOGICAL INDICATORS, 2024, 169
  • [28] Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
    Mamun, Md
    Yang, Xiao
    ECOLOGICAL INFORMATICS, 2025, 87
  • [29] Predicting Runway Configurations and Arrival and Departure Rates at Airports: Comparing the Accuracy of Multiple Machine Learning Models
    Raju, Ramakrishna
    Mital, Rohit
    Wilson, Bruce
    Shetty, Kamala
    Albert, Michael
    2021 IEEE/AIAA 40TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2021,
  • [30] Chlorophyll soft-sensor based on machine learning models for algal bloom predictions
    Mozo, Alberto
    Moron-Lopez, Jesus
    Vakaruk, Stanislav
    Pompa-Pernia, Angel G.
    Gonzalez-Prieto, Angel
    Pascual Aguilar, Juan Antonio
    Gomez-Canaval, Sandra
    Manuel Ortiz, Juan
    SCIENTIFIC REPORTS, 2022, 12 (01)