Do machine learning methods improve prediction of ambient air pollutants with high spatial contrast? A systematic review

被引:2
作者
Vachon, Julien [1 ,2 ,3 ]
Kerckhoffs, Jules [4 ]
Buteau, Stephane [1 ,2 ,3 ]
Smargiassi, Audrey [1 ,2 ,3 ]
机构
[1] Univ Montreal, Sch Publ Hlth, Dept Environm & Occupat Hlth, 7101 Ave Parc,Local 3259, Montreal, PQ, Canada
[2] Univ Montreal, Ctr Publ Hlth Res CReSP, Montreal, PQ, Canada
[3] CIUSSS Ctr Sud Delile De Montreal, Montreal, PQ, Canada
[4] Univ Utrecht, Inst Risk Assessment Sci, Utrecht, Netherlands
关键词
Machine learning; Spatial-temporal prediction; Exposure assessment; UFPs; BC; NO2; USE REGRESSION-MODELS; ULTRAFINE PARTICLES; POLLUTION EXPOSURE; PARTICULATE MATTER; NO2; MOBILE; VARIABILITY;
D O I
10.1016/j.envres.2024.119751
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Background & objective: The use of machine learning for air pollution modelling is rapidly increasing. We conducted a systematic review of studies comparing statistical and machine learning models predicting the spatiotemporal variation of ambient nitrogen dioxide (NO2), ultrafine particles (UFPs) and black carbon (BC) to determine whether and in which scenarios machine learning generates more accurate predictions. Methods: Web of Science and Scopus were searched up to June 13, 2024. All records were screened by two independent reviewers. Differences in the coefficient of determination (R-2) and Root Mean Square Error (RMSE) between best statistical and machine learning methods were compared across categories of methodological elements. Results: A total of 38 studies with 46 model comparisons (30 for NO2, 8 for UFPs and 8 for BC) were included. Linear non-regularized methods and Random Forest were most frequently used. Machine learning outperformed statistical models in 34 comparisons. Mean differences (95% confidence intervals) in R-2 and RMSE between best machine learning and statistical models were 0.12 (0.08, 0.17) and 20% (11%, 29%) respectively. Tree-based methods performed best in 12 of 17 multi-model comparisons. Nonlinear or regularization regression methods were used in only 12 comparisons and provided similar performance to machine learning methods. Conclusion: This systematic review suggests that machine learning methods, especially tree-based methods, may be superior to linear non-regularized methods for predicting ambient concentrations of NO2, UFPs and BC. Additional comparison studies using nonlinear, regularized and a wider array of machine learning methods are needed to confirm their relative performance. Future air pollution studies would also benefit from more explicit and standardized reporting of methodologies and results.
引用
收藏
页数:13
相关论文
共 102 条
  • [1] Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan
    Araki, Shin
    Shima, Masayuki
    Yamamoto, Kouhei
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2018, 634 : 1269 - 1277
  • [2] Best practices in machine learning for chemistry comment
    Artrith, Nongnuch
    Butler, Keith T.
    Coudert, Francois-Xavier
    Han, Seungwu
    Isayev, Olexandr
    Jain, Anubhav
    Walsh, Aron
    [J]. NATURE CHEMISTRY, 2021, 13 (06) : 505 - 508
  • [3] Cross-Validation: What Does It Estimate and How Well Does It Do It?
    Bates, Stephen
    Hastie, Trevor
    Tibshirani, Robert
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1434 - 1445
  • [4] Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe - The ESCAPE project
    Beelen, Rob
    Hoek, Gerard
    Vienneau, Danielle
    Eeftens, Marloes
    Dimakopoulou, Konstantina
    Pedeli, Xanthi
    Tsai, Ming-Yi
    Kunzli, Nino
    Schikowski, Tamara
    Marcon, Alessandro
    Eriksen, Kirsten T.
    Raaschou-Nielsen, Ole
    Stephanou, Euripides
    Patelarou, Evridiki
    Lanki, Timo
    Yli-Tuomi, Tarja
    Declercq, Christophe
    Falq, Gregoire
    Stempfelet, Morgane
    Birk, Matthias
    Cyrys, Josef
    von Klot, Stephanie
    Nador, Gizella
    Varro, Mihaly Janos
    Dedele, Audrius
    Grazuleviciene, Regina
    Moelter, Anna
    Lindley, Sarah
    Madsen, Christian
    Cesaroni, Giulia
    Ranzi, Andrea
    Badaloni, Chiara
    Hoffmann, Barbara
    Nonnemacher, Michael
    Kraemer, Ursula
    Kuhlbusch, Thomas
    Cirach, Marta
    de Nazelle, Audrey
    Nieuwenhuijsen, Mark
    Bellander, Tom
    Korek, Michal
    Olsson, David
    Stromgren, Magnus
    Dons, Evi
    Jerrett, Michael
    Fischer, Paul
    Wang, Meng
    Brunekreef, Bert
    de Hoogh, Kees
    [J]. ATMOSPHERIC ENVIRONMENT, 2013, 72 : 10 - 23
  • [5] A systematic review of data mining and machine learning for air pollution epidemiology
    Bellinger, Colin
    Jabbar, Mohomed Shazan Mohomed
    Zaiane, Osmar
    Osornio-Vargas, Alvaro
    [J]. BMC PUBLIC HEALTH, 2017, 17
  • [6] Urban-Scale NO2 Prediction with Sensors Aboard Bicycles: A Comparison of Statistical Methods Using Synthetic Observations
    Bertero, Christophe
    Leon, Jean-Francois
    Tredan, Gilles
    Roy, Mathieu
    Armengaud, Alexandre
    [J]. ATMOSPHERE, 2020, 11 (09)
  • [7] A review of feature selection methods on synthetic data
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
  • [8] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [9] A review of artificial neural network models for ambient air pollution prediction
    Cabaneros, Sheen Mclean
    Calautit, John Kaiser
    Hughes, Ben Richard
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2019, 119 : 285 - 304
  • [10] National ground-level NO2 predictions via satellite imagery driven convolutional neural networks
    Cao, Elton L.
    [J]. FRONTIERS IN ENVIRONMENTAL SCIENCE, 2023, 11