Flexible Bayesian Ensemble Machine Learning Framework for Predicting Local Ozone Concentrations

被引:36
作者
Ren, Xiang [1 ,2 ]
Mi, Zhongyuan [1 ,3 ]
Cai, Ting [1 ]
Nolte, Christopher G. [4 ]
Georgopoulos, Panos G. [1 ,2 ,3 ,5 ]
机构
[1] Rutgers State Univ, Environm & Occupat Hlth Sci Inst EOHSI, Piscataway, NJ 08854 USA
[2] Rutgers State Univ, Dept Chem & Biochem Engn, Piscataway, NJ 08854 USA
[3] Rutgers State Univ, Dept Environm Sci, New Brunswick, NJ 08901 USA
[4] US EPA, Ctr Environm Measurement & Modeling, Res Triangle Pk, NC 27711 USA
[5] Rutgers Sch Publ Hlth, Dept Environm & Occupat Hlth & Justice, Piscataway, NJ 08854 USA
关键词
ozone; interpretable machine learning; data fusion; exposure assessment; environmental and climate justice; USE REGRESSION-MODELS; EXPOSURE ASSESSMENT; PM2.5; RESOLUTION; CHINA;
D O I
10.1021/acs.est.1c04076
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
3D-grid-based chemical transport models, such as the Community Multiscale Air Quality (CMAQ) modeling system, have been widely used for predicting concentrations of ambient air pollutants. However, typical horizontal resolutions of nationwide CMAQ simulations (12 X 12 km(2)) cannot capture local-scale gradients for accurately assessing human exposures and environmental justice disparities. In this study, a Bayesian ensemble machine learning (BEML) framework, which integrates 13 learning algorithms, was developed for downscaling CMAQ estimates of ozone daily maximum 8 h averages to the census tract level, across the contiguous US, and was demonstrated for 2011. Three-stage hyper-parameter tuning and targeted validations were designed to ensure the ensemble model's ability to interpolate, extrapolate, and capture concentration peaks. The Shapley value metric from coalitional game theory was applied to interpret the drivers of subgrid gradients. The flexibility (transferability) of the 2011-trained BEML model was further tested by evaluating its ability to estimate fine-scale concentrations for other years (2012-2017) without retraining. To demonstrate the feasibility of using the BEML approach to strictly "data-limited" situations, the model was applied to downscale CMAQ outputs for a future-year scenario-based simulation that considers effects of variations in meteorology associated with climate change.
引用
收藏
页码:3871 / 3883
页数:13
相关论文
共 54 条
  • [1] Bayesian networks in environmental modelling
    Aguilera, P. A.
    Fernandez, A.
    Fernandez, R.
    Rumi, R.
    Salmeron, A.
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2011, 26 (12) : 1376 - 1388
  • [2] [Anonymous], 2017, arXiv preprint arXiv:1702.08608, DOI DOI 10.48550/ARXIV.1702.08608
  • [3] [Anonymous], USEPA RSIG RELATED D
  • [4] [Anonymous], 2015, EPA454S15001
  • [5] [Anonymous], USEPA AIR DATA AIR Q
  • [6] [Anonymous], USEPA DOWNSCALER MOD
  • [7] [Anonymous], USEPA CMAQ PUBLICATI
  • [8] A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5 concentration
    Berrocal, Veronica J.
    Guan, Yawen
    Muyskens, Amanda
    Wang, Haoyu
    Reich, Brian J.
    Mulholland, James A.
    Chang, Howard H.
    [J]. ATMOSPHERIC ENVIRONMENT, 2020, 222
  • [9] A Spatio-Temporal Downscaler for Output From Numerical Models
    Berrocal, Veronica J.
    Gelfand, Alan E.
    Holland, David M.
    [J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2010, 15 (02) : 176 - 197
  • [10] Biecek P., 2021, EXPLANATORY MODEL AN, P107