Comparing the performance of 10 machine learning models in predicting Chlorophyll a in western Lake Erie

被引:0
|
作者
Song, Yang [1 ]
Shen, Chunqi [2 ]
Hong, Yi [1 ]
机构
[1] Univ Michigan, Cooperat Inst Great Lakes Res, Sch Environm & Sustainabil, Ann Arbor, MI 48109 USA
[2] Yale Univ, Yale Sch Environm, New Haven, CT 06511 USA
关键词
Phytoplankton; Eutrophication; Machine learning; Model comparison; Preprocessing; Key predictor; ALGAL BLOOMS; EUTROPHICATION;
D O I
10.1016/j.jenvman.2025.125007
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Algal blooms, which have substantial adverse effects, are increasingly occurring worldwide in the context of global warming and eutrophication. Machine learning models (MLMs) are emerging as efficient and promising tools for predicting algal blooms. However, the performance of MLMs in directly simulating algal blooms has seldom been reported, particularly in the world's largest freshwater system, the Great Lakes. To address this gap, we compared the prediction performance of Chlorophyll a (Chl a, a proxy for algal biomass) concentration in western Lake Erie among 10 popular MLMs using 15 measured water quality data collected from 2012 to 2022. Results have shown that outlier removal is essential, as it can noticeably improve prediction accuracy such as increasing the coefficient of determination (R2) from 0.35 to 0.84 (140 %) for the optimal Gradient Boosting Decision Trees (GBDT) model. All 32,767 feature combinations of measured water quality parameters were exhaustively tested for each MLM and the best feature combinations are identified. MLMs benefit from this feature selection, with the Polynomial Regression model showing notable improvements: the R2 increased from 0.71 to 0.82 (15 %) compared to no feature selection. The tree-based ensemble models, including the GBDT (R2 = 0.84) and Random Forest (R2 = 0.82) models, show the top two performances in predicting Chl a. Based on feature importance analysis, particulate organic nitrogen (PON) is determined to be the most critical water quality parameter for predicting Chl a. These results establish a benchmark for the performance of common MLMs in predicting Chl a in western Lake Erie. The determined best feature combinations potentially make water quality observations more effective and targeted, thereby benefiting sustainable water quality management.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Effect of phosphorus fractions on benthic chlorophyll-a: Insight from the machine learning models
    Wang, Yuting
    Khan, Sangar
    Lin, Zongwei
    Qi, Xinxin
    Eltohamy, Kamel M.
    Oduro, Collins
    Gao, Chao
    Milham, Paul J.
    Wu, Naicheng
    ECOLOGICAL INFORMATICS, 2025, 85
  • [32] Effect of feature optimization on performance of machine learning models for predicting traffic incident duration
    Obaid, Lubna
    Hamad, Khaled
    Khalil, Mohamad Ali
    Nassif, Ali Bou
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [33] Optimized machine learning models for predicting the tensile strength of high-performance concrete
    Kumar, Divesh Ranjan
    Kumar, Pramod
    Thangavel, Pradeep
    Wipulanusat, Warit
    Thongchom, Chanachai
    Samui, Pijush
    JOURNAL OF STRUCTURAL INTEGRITY AND MAINTENANCE, 2025, 10 (01)
  • [34] Evaluation of machine learning models for predicting performance metrics of aero-engine combustors
    Yang, Huan
    Guo, Shu
    Xie, Haolin
    Wen, Jian
    Wang, Jiarui
    CASE STUDIES IN THERMAL ENGINEERING, 2025, 65
  • [35] Predicting academic performance of students with machine learning
    Balcioglu, Yavuz Selim
    Artar, Melike
    INFORMATION DEVELOPMENT, 2023,
  • [36] Machine learning for predicting diabetes risk in western China adults
    Lin Li
    Yinlin Cheng
    Weidong Ji
    Mimi Liu
    Zhensheng Hu
    Yining Yang
    Yushan Wang
    Yi Zhou
    Diabetology & Metabolic Syndrome, 15
  • [37] Predicting ACL Reconstruction Failure with Machine Learning: Development of Machine Learning Prediction Models
    Alaiti, Rafael Krasic
    Vallio, Caio Sain
    da Silva, Andre Giardino Moreira
    Gobbi, Riccardo Gomes
    Pecora, Jose Ricardo
    Helito, Camilo Partezani
    ORTHOPAEDIC JOURNAL OF SPORTS MEDICINE, 2025, 13 (03)
  • [38] Machine learning for predicting diabetes risk in western China adults
    Li, Lin
    Cheng, Yinlin
    Ji, Weidong
    Liu, Mimi
    Hu, Zhensheng
    Yang, Yining
    Wang, Yushan
    Zhou, Yi
    DIABETOLOGY & METABOLIC SYNDROME, 2023, 15 (01):
  • [39] Investigating machine learning models in predicting lake water quality parameters as a 3-year moving average
    Gorgan-Mohammadi, Faezeh
    Rajaee, Taher
    Zounemat-Kermani, Mohammad
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (23) : 63839 - 63863
  • [40] Investigating machine learning models in predicting lake water quality parameters as a 3-year moving average
    Faezeh Gorgan-Mohammadi
    Taher Rajaee
    Mohammad Zounemat-Kermani
    Environmental Science and Pollution Research, 2023, 30 : 63839 - 63863