An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution

被引:459
作者
Di, Qian [1 ,2 ]
Amini, Heresh [1 ]
Shi, Liuhua [1 ]
Kloog, Itai [3 ]
Silvern, Rachel [4 ]
Kelly, James [5 ]
Sabath, M. Benjamin [6 ]
Choirat, Christine [6 ]
Koutrakis, Petros [1 ]
Lyapustin, Alexei [7 ]
Wang, Yujie [8 ]
Mickley, Loretta J. [9 ]
Schwartz, Joel [1 ]
机构
[1] Harvard TH Chan Sch Publ Heath, Dept Environm Hlth, Boston, MA USA
[2] Tsinghua Univ, Res Ctr Publ Hlth, Beijing, Peoples R China
[3] Ben Gurion Univ Negev, Dept Geog & Environm Dev, Beer Sheva, Israel
[4] Harvard Univ, Dept Earth & Planetary Sci, 20 Oxford St, Cambridge, MA 02138 USA
[5] US EPA, Off Air Qual Planning & Stand, Res Triangle Pk, NC 27711 USA
[6] Harvard TH Chan Sch Publ Heath, Dept Biostat, Boston, MA USA
[7] NASA, Goddard Space Flight Ctr, Greenbelt, MD USA
[8] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA
[9] Harvard Univ, John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA
关键词
Fine particulate matter (PM2.5); Ensemble model; Neural network; Gradient boosting; Random forest; AEROSOL OPTICAL DEPTH; GROUND-LEVEL PM2.5; PARTICULATE AIR-POLLUTION; USE REGRESSION-MODELS; ISOPRENE EMISSION; ORGANIC AEROSOL; TERM EXPOSURE; SATELLITE; MORTALITY; MODIS;
D O I
10.1016/j.envint.2019.104909
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Various approaches have been proposed to model PM2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several meteorological variables as major predictor variables. Our study used an ensemble model that integrated multiple machine learning algorithms and predictor variables to estimate daily PM(2.5 )at a resolution of 1 km x 1 km across the contiguous United States. We used a generalized additive model that accounted for geographic difference to combine PM2.5 estimates from neural network, random forest, and gradient boosting. The three machine learning algorithms were based on multiple predictor variables, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and others. The model training results from 2000 to 2015 indicated good model performance with a 10-fold cross-validated R-2 of 0.86 for daily PM2.5 predictions. For annual PM2.5 estimates, the cross-validated R-2 was 0.89. Our model demonstrated good performance up to 60 mu g/m(3). Using trained PM2.5 model and predictor variables, we predicted daily PM2.5 from 2000 to 2015 at every 1 km x 1 km grid cell in the contiguous United States. We also used localized land-use variables within 1 km x 1 km grids to downscale PM2.5 predictions to 100 m x 100 m grid cells. To characterize uncertainty, we used meteorological variables, land-use variables, and elevation to model the monthly standard deviation of the difference between daily monitored and predicted PM2.5 for every 1 km x 1 km grid cell. This PM2.5 prediction dataset, including the downscaled and uncertainty predictions, allows epidemiologists to accurately estimate the adverse health effect of PM2.5. Compared with model performance of individual base learners, an ensemble model would achieve a better overall estimation. It is worth exploring other ensemble model formats to synthesize estimations from different models or from different groups to improve overall performance.
引用
收藏
页数:13
相关论文
共 81 条
[1]  
[Anonymous], 2004, Neural Networks
[2]  
[Anonymous], J GEOPHYS RES ATMOS
[3]   Description and evaluation of the Community Multiscale Air Quality (CMAQ) modeling system version 5.1 [J].
Appel, K. Wyat ;
Napelenok, Sergey L. ;
Foley, Kristen M. ;
Pye, Havala O. T. ;
Hogrefe, Christian ;
Luecken, Deborah J. ;
Bash, Jesse O. ;
Roselle, Shawn J. ;
Pleim, Jonathan E. ;
Foroutan, Hosein ;
Hutzell, William T. ;
Pouliot, George A. ;
Sarwar, Golam ;
Fahey, Kathleen M. ;
Gantt, Brett ;
Gilliam, Robert C. ;
Heath, Nicholas K. ;
Kang, Daiwen ;
Mathur, Rohit ;
Schwede, Donna B. ;
Spero, Tanya L. ;
Wong, David C. ;
Young, Jeffrey O. .
GEOSCIENTIFIC MODEL DEVELOPMENT, 2017, 10 (04) :1703-1732
[4]   A Geographically and Temporally Weighted Regression Model for Ground-Level PM2.5 Estimation from Satellite-Derived 500 m Resolution AOD [J].
Bai, Yang ;
Wu, Lixin ;
Qin, Kai ;
Zhang, Yufeng ;
Shen, Yangyang ;
Zhou, Yuan .
REMOTE SENSING, 2016, 8 (03)
[5]   Estimating urban PM10 and PM2.5 concentrations, based on synergistic MERIS/AATSR aerosol observations, land cover and morphology data [J].
Beloconi, Anton ;
Kamarianakis, Yiannis ;
Chrysoulakis, Nektarios .
REMOTE SENSING OF ENVIRONMENT, 2016, 172 :148-164
[6]   Spectral absorption properties of atmospheric aerosols [J].
Bergstrom, R. W. ;
Pilewskie, P. ;
Russell, P. B. ;
Redemann, J. ;
Bond, T. C. ;
Quinn, P. K. ;
Sierau, B. .
ATMOSPHERIC CHEMISTRY AND PHYSICS, 2007, 7 (23) :5937-5943
[7]   Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation [J].
Bey, I ;
Jacob, DJ ;
Yantosca, RM ;
Logan, JA ;
Field, BD ;
Fiore, AM ;
Li, QB ;
Liu, HGY ;
Mickley, LJ ;
Schultz, MG .
JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 2001, 106 (D19) :23073-23095
[8]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[9]  
Bishop C. M., 2006, PATTERN RECOGN, V128, P1, DOI DOI 10.1117/1.2819119
[10]   Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches [J].
Brokamp, Cole ;
Jandarov, Roman ;
Rao, M. B. ;
LeMasters, Grace ;
Ryan, Patrick .
ATMOSPHERIC ENVIRONMENT, 2017, 151 :1-11