Comparing the performance of 10 machine learning models in predicting Chlorophyll a in western Lake Erie

被引:0
|
作者
Song, Yang [1 ]
Shen, Chunqi [2 ]
Hong, Yi [1 ]
机构
[1] Univ Michigan, Cooperat Inst Great Lakes Res, Sch Environm & Sustainabil, Ann Arbor, MI 48109 USA
[2] Yale Univ, Yale Sch Environm, New Haven, CT 06511 USA
关键词
Phytoplankton; Eutrophication; Machine learning; Model comparison; Preprocessing; Key predictor; ALGAL BLOOMS; EUTROPHICATION;
D O I
10.1016/j.jenvman.2025.125007
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Algal blooms, which have substantial adverse effects, are increasingly occurring worldwide in the context of global warming and eutrophication. Machine learning models (MLMs) are emerging as efficient and promising tools for predicting algal blooms. However, the performance of MLMs in directly simulating algal blooms has seldom been reported, particularly in the world's largest freshwater system, the Great Lakes. To address this gap, we compared the prediction performance of Chlorophyll a (Chl a, a proxy for algal biomass) concentration in western Lake Erie among 10 popular MLMs using 15 measured water quality data collected from 2012 to 2022. Results have shown that outlier removal is essential, as it can noticeably improve prediction accuracy such as increasing the coefficient of determination (R2) from 0.35 to 0.84 (140 %) for the optimal Gradient Boosting Decision Trees (GBDT) model. All 32,767 feature combinations of measured water quality parameters were exhaustively tested for each MLM and the best feature combinations are identified. MLMs benefit from this feature selection, with the Polynomial Regression model showing notable improvements: the R2 increased from 0.71 to 0.82 (15 %) compared to no feature selection. The tree-based ensemble models, including the GBDT (R2 = 0.84) and Random Forest (R2 = 0.82) models, show the top two performances in predicting Chl a. Based on feature importance analysis, particulate organic nitrogen (PON) is determined to be the most critical water quality parameter for predicting Chl a. These results establish a benchmark for the performance of common MLMs in predicting Chl a in western Lake Erie. The determined best feature combinations potentially make water quality observations more effective and targeted, thereby benefiting sustainable water quality management.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Predicting Credit Repayment Capacity with Machine Learning Models
    Filiz, Gozde
    Bodur, Tolga
    Yaslidag, Nihal
    Sayar, Alperen
    Cakar, Tuna
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [42] Statistical and machine learning models for predicting spalling in CRCP
    Al-Khateeb, Ghazi G.
    Alnaqbi, Ali
    Zeiada, Waleed
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [43] Predicting pipeline burst pressures with machine learning models
    Phan, Hieu Chi
    Dhar, Ashutosh Sutra
    INTERNATIONAL JOURNAL OF PRESSURE VESSELS AND PIPING, 2021, 191
  • [44] Predicting Asthma Exacerbations Using Machine Learning Models
    Turcatel, Gianluca
    Xiao, Yi
    Caveney, Scott
    Gnacadja, Gilles
    Kim, Julie
    Molfino, Nestor A.
    ADVANCES IN THERAPY, 2025, 42 (01) : 362 - 374
  • [45] Predicting Personality with Twitter Data and Machine Learning Models
    Ergu, Izel
    Isik, Zerrin
    Yankayis, Ismail
    2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 386 - 390
  • [46] Machine learning models for predicting preeclampsia: a systematic review
    Ranjbar, Amene
    Montazeri, Farideh
    Ghamsari, Sepideh Rezaei
    Mehrnoush, Vahid
    Roozbeh, Nasibeh
    Darsareh, Fatemeh
    BMC PREGNANCY AND CHILDBIRTH, 2024, 24 (01)
  • [47] Remote Sensing of Chlorophyll-a in Xinkai Lake Using Machine Learning and GF-6 WFV Images
    Xu, Shiqi
    Li, Sijia
    Tao, Zui
    Song, Kaishan
    Wen, Zhidan
    Li, Yong
    Chen, Fangfang
    REMOTE SENSING, 2022, 14 (20)
  • [48] Predicting Water Quality Parameters in Lake Pontchartrain using Machine Learning
    Daniels, Alexis
    Koutsougeras, Cris
    5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2021), 2021, : 28 - 33
  • [49] Comparing machine learning algorithms for predicting COVID-19 mortality
    Khadijeh Moulaei
    Mostafa Shanbehzadeh
    Zahra Mohammadi-Taghiabad
    Hadi Kazemi-Arpanahi
    BMC Medical Informatics and Decision Making, 22
  • [50] Comparing machine learning algorithms for predicting COVID-19 mortality
    Moulaei, Khadijeh
    Shanbehzadeh, Mostafa
    Mohammadi-Taghiabad, Zahra
    Kazemi-Arpanahi, Hadi
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)