Do machine learning methods improve prediction of ambient air pollutants with high spatial contrast? A systematic review

被引：2

作者：

Vachon, Julien ^{[1
,2
,3
]}

Kerckhoffs, Jules ^{[4
]}

Buteau, Stephane ^{[1
,2
,3
]}

Smargiassi, Audrey ^{[1
,2
,3
]}

机构：

[1] Univ Montreal, Sch Publ Hlth, Dept Environm & Occupat Hlth, 7101 Ave Parc,Local 3259, Montreal, PQ, Canada

[2] Univ Montreal, Ctr Publ Hlth Res CReSP, Montreal, PQ, Canada

[3] CIUSSS Ctr Sud Delile De Montreal, Montreal, PQ, Canada

[4] Univ Utrecht, Inst Risk Assessment Sci, Utrecht, Netherlands

来源：

ENVIRONMENTAL RESEARCH | 2024年 / 262卷

关键词：

Machine learning; Spatial-temporal prediction; Exposure assessment; UFPs; BC; NO2; USE REGRESSION-MODELS; ULTRAFINE PARTICLES; POLLUTION EXPOSURE; PARTICULATE MATTER; NO2; MOBILE; VARIABILITY;

D O I：

10.1016/j.envres.2024.119751

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Background & objective: The use of machine learning for air pollution modelling is rapidly increasing. We conducted a systematic review of studies comparing statistical and machine learning models predicting the spatiotemporal variation of ambient nitrogen dioxide (NO2), ultrafine particles (UFPs) and black carbon (BC) to determine whether and in which scenarios machine learning generates more accurate predictions. Methods: Web of Science and Scopus were searched up to June 13, 2024. All records were screened by two independent reviewers. Differences in the coefficient of determination (R-2) and Root Mean Square Error (RMSE) between best statistical and machine learning methods were compared across categories of methodological elements. Results: A total of 38 studies with 46 model comparisons (30 for NO2, 8 for UFPs and 8 for BC) were included. Linear non-regularized methods and Random Forest were most frequently used. Machine learning outperformed statistical models in 34 comparisons. Mean differences (95% confidence intervals) in R-2 and RMSE between best machine learning and statistical models were 0.12 (0.08, 0.17) and 20% (11%, 29%) respectively. Tree-based methods performed best in 12 of 17 multi-model comparisons. Nonlinear or regularization regression methods were used in only 12 comparisons and provided similar performance to machine learning methods. Conclusion: This systematic review suggests that machine learning methods, especially tree-based methods, may be superior to linear non-regularized methods for predicting ambient concentrations of NO2, UFPs and BC. Additional comparison studies using nonlinear, regularized and a wider array of machine learning methods are needed to confirm their relative performance. Future air pollution studies would also benefit from more explicit and standardized reporting of methodologies and results.

引用

页数：13

共 102 条

[1] Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan
Araki, Shin
Shima, Masayuki
Yamamoto, Kouhei
[J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2018, 634 : 1269 - 1277
[2] Best practices in machine learning for chemistry comment
Artrith, Nongnuch
Butler, Keith T.
Coudert, Francois-Xavier
Han, Seungwu
Isayev, Olexandr
Jain, Anubhav
Walsh, Aron
[J]. NATURE CHEMISTRY, 2021, 13 (06) : 505 - 508
[3] Cross-Validation: What Does It Estimate and How Well Does It Do It?
Bates, Stephen
Hastie, Trevor
Tibshirani, Robert
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1434 - 1445
[4] Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe - The ESCAPE project
Beelen, Rob
Hoek, Gerard
Vienneau, Danielle
Eeftens, Marloes
Dimakopoulou, Konstantina
Pedeli, Xanthi
Tsai, Ming-Yi
Kunzli, Nino
Schikowski, Tamara
Marcon, Alessandro
Eriksen, Kirsten T.
Raaschou-Nielsen, Ole
Stephanou, Euripides
Patelarou, Evridiki
Lanki, Timo
Yli-Tuomi, Tarja
Declercq, Christophe
Falq, Gregoire
Stempfelet, Morgane
Birk, Matthias
Cyrys, Josef
von Klot, Stephanie
Nador, Gizella
Varro, Mihaly Janos
Dedele, Audrius
Grazuleviciene, Regina
Moelter, Anna
Lindley, Sarah
Madsen, Christian
Cesaroni, Giulia
Ranzi, Andrea
Badaloni, Chiara
Hoffmann, Barbara
Nonnemacher, Michael
Kraemer, Ursula
Kuhlbusch, Thomas
Cirach, Marta
de Nazelle, Audrey
Nieuwenhuijsen, Mark
Bellander, Tom
Korek, Michal
Olsson, David
Stromgren, Magnus
Dons, Evi
Jerrett, Michael
Fischer, Paul
Wang, Meng
Brunekreef, Bert
de Hoogh, Kees
[J]. ATMOSPHERIC ENVIRONMENT, 2013, 72 : 10 - 23
[5] A systematic review of data mining and machine learning for air pollution epidemiology
Bellinger, Colin
Jabbar, Mohomed Shazan Mohomed
Zaiane, Osmar
Osornio-Vargas, Alvaro
[J]. BMC PUBLIC HEALTH, 2017, 17
[6] Urban-Scale NO2 Prediction with Sensors Aboard Bicycles: A Comparison of Statistical Methods Using Synthetic Observations
Bertero, Christophe
Leon, Jean-Francois
Tredan, Gilles
Roy, Mathieu
Armengaud, Alexandre
[J]. ATMOSPHERE, 2020, 11 (09)
[7] A review of feature selection methods on synthetic data
Bolon-Canedo, Veronica
Sanchez-Marono, Noelia
Alonso-Betanzos, Amparo
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
[8] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[9] A review of artificial neural network models for ambient air pollution prediction
Cabaneros, Sheen Mclean
Calautit, John Kaiser
Hughes, Ben Richard
[J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2019, 119 : 285 - 304
[10] National ground-level NO2 predictions via satellite imagery driven convolutional neural networks
Cao, Elton L.
[J]. FRONTIERS IN ENVIRONMENTAL SCIENCE, 2023, 11

← 1 2 3 4 5 6 7 8 9 10 →