A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide

被引:236
作者
Chen, Jie [1 ]
de Hoogh, Kees [2 ,3 ]
Gulliver, John [4 ]
Hoffmann, Barbara [5 ]
Hertel, Ole [6 ]
Ketzel, Matthias [6 ,7 ]
Bauwelinck, Mariska [8 ]
van Donkelaar, Aaron [9 ]
Hvidtfeldt, Ulla A. [10 ]
Katsouyanni, Klea [11 ,12 ,13 ]
Janssen, Nicole A. H. [14 ]
Martin, Randall V. [9 ,15 ]
Samoli, Evangelia [11 ]
Schwartz, Per E. [16 ]
Stafoggia, Massimo [17 ,18 ]
Bellander, Tom [18 ]
Strak, Maciek [1 ]
Wolf, Kathrin [19 ]
Vienneau, Danielle [2 ,3 ]
Vermeulen, Roel [1 ,20 ]
Brunekreef, Bert [1 ,20 ]
Hoek, Gerard [1 ]
机构
[1] Univ Utrecht, IRAS, Postbus 80125, NL-3508 TC Utrecht, Netherlands
[2] Swiss Trop & Publ Hlth Inst, Socinstr 57, CH-4051 Basel, Switzerland
[3] Univ Basel, Peterspl 1, CH-4001 Basel, Switzerland
[4] Univ Leicester, Ctr Environm Hlth & Sustainabil, Sch Geog Geol & Environm, Univ Rd, Leicester LE1 7RH, Leics, England
[5] Heinrich Heine Univ Dusseldorf, Inst Occupat Social & Environm Med, Ctr Hlth & Soc, Fac Med, Univ Str 1, D-40225 Dusseldorf, Germany
[6] Aarhus Univ, Dept Environm Sci, POB 358,Frederiksborgvej 399, DK-4000 Roskilde, Denmark
[7] Univ Surrey, Dept Civil & Environm Engn, Global Ctr Clean Air Res GCARE, Guildford GU2 7XH, Surrey, England
[8] Vrije Univ Brussel, Dept Sociol, Interface Demog, Pl Laan 2, B-1050 Brussels, Belgium
[9] Dalhousie Univ, Dept Phys & Atmospher Sci, Halifax, NS B3H 4R2, Canada
[10] Danish Canc Soc, Res Ctr, Strandblvd 49, DK-2100 Copenhagen, Denmark
[11] Natl & Kapodistrian Univ Athens, Dept Hyg Epidemiol & Med Stat, Sch Med, 75 Mikras Asias Str, Athens 11527, Greece
[12] Kings Coll Strand, Dept Populat Hlth Sci, Sch Populat Hlth & Environm Sci, London WC2R 2LS, England
[13] Kings Coll Strand, Dept Analyt Environm & Forens Sci, Sch Populat Hlth & Environm Sci, London WC2R 2LS, England
[14] Natl Inst Publ Hlth & Environm RIVM, POB 1, NL-3720 BA Bilthoven, Netherlands
[15] Harvard Smithsonian Ctr Astrophys, Atom & Mol Phys Div, 60 Garden St, Cambridge, MA 02138 USA
[16] Norwegian Inst Publ Hlth, Div Environm Med, POB 4404 Nydalen, N-0403 Oslo, Norway
[17] Lazio Reg Hlth Serv ASL Roma 1, Dept Epidemiol, Via Cristoforo Colombo 112, I-00147 Rome, Italy
[18] Karolinska Inst, Inst Environm Med, SE-17177 Stockholm, Sweden
[19] German Res Ctr Environm Hlth GmbH, Helmholtz Zentrum Munchen, Inst Epidemiol, Ingolstadter Landstr 1, D-85764 Neuherberg, Germany
[20] Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Heidelberglaan 100, NL-3584 CX Utrecht, Netherlands
基金
美国国家环境保护局;
关键词
Land use regression; Fine particles; Nitrogen dioxide; Machine learning; LAND-USE REGRESSION; DAILY PM2.5 CONCENTRATIONS; AIR-POLLUTION; PARTICULATE MATTER; SPATIOTEMPORAL PREDICTION; EXPOSURE ASSESSMENT; NO2; SATELLITE; ATHEROSCLEROSIS; COMPONENTS;
D O I
10.1016/j.envint.2019.104934
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Empirical spatial air pollution models have been applied extensively to assess exposure in epidemiological studies with increasingly sophisticated and complex statistical algorithms beyond ordinary linear regression. However, different algorithms have rarely been compared in terms of their predictive ability. This study compared 16 algorithms to predict annual average fine particle (PM2.5) and nitrogen dioxide (NO2) concentrations across Europe. The evaluated algorithms included linear stepwise regression, regularization techniques and machine learning methods. Air pollution models were developed based on the 2010 routine monitoring data from the AIRBASE dataset maintained by the European Environmental Agency (543 sites for PM2.5 and 2399 sites for NO2), using satellite observations, dispersion model estimates and land use variables as predictors. We compared the models by performing five-fold cross-validation (CV) and by external validation (EV) using annual average concentrations measured at 416 (PM2.5) and 1396 sites (NO2) from the ESCAPE study. We further assessed the correlations between predictions by each pair of algorithms at the ESCAPE sites. For PM2.5, the models performed similarly across algorithms with a mean CV R-2 of 0.59 and a mean EV R-2 of 0.53. Generalized boosted machine, random forest and bagging performed best (CV R-2 similar to 0.63; EV R-2 0.58-0.61), while backward stepwise linear regression, support vector regression and artificial neural network performed less well (CV R-2 0.48-0.57; EV R-2 0.39-0.46). Most of the PM2.5 model predictions at ESCAPE sites were highly correlated (R-2 > 0.85, with the exception of predictions from the artificial neural network). For NO2, the models performed even more similarly across different algorithms, with CV R-2 s ranging from 0.57 to 0.62, and EV R (2) s ranging from 0.49 to 0.51. The predicted concentrations from all algorithms at ESCAPE sites were highly correlated (R-2 > 0.9). For both pollutants, biases were low for all models except the artificial neural network. Dispersion model estimates and satellite observations were two of the most important predictors for PM2.5 models whilst dispersion model estimates and traffic variables were most important for NO2 models in all algorithms that allow assessment of the importance of variables. Different statistical algorithms performed similarly when modelling spatial variation in annual average air pollution concentrations using a large number of training sites.
引用
收藏
页数:14
相关论文
共 63 条
[1]   A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations [J].
Agier, Lydiane ;
Portengen, Lutzen ;
Chadeau-Hyam, Marc ;
Basagana, Xavier ;
Giorgis-Allemand, Lise ;
Siroux, Valerie ;
Robinson, Oliver ;
Vlaanderen, Jelle ;
Gonzalez, Juan R. ;
Nieuwenhuijsen, Mark J. ;
Vineis, Paolo ;
Vrijheid, Martine ;
Slama, Remy ;
Vermeulen, Roel .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2016, 124 (12) :1848-1856
[2]  
[Anonymous], 2001, ELEMENTS STAT LEARNI
[3]  
[Anonymous], 2015, R PACKAGE VERSION 1
[4]  
[Anonymous], 2016, R Package Version
[5]   Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan [J].
Araki, Shin ;
Shima, Masayuki ;
Yamamoto, Kouhei .
SCIENCE OF THE TOTAL ENVIRONMENT, 2018, 634 :1269-1277
[6]   Effect of the number of measurement sites on land use regression models in estimating local air pollution [J].
Basagana, Xavier ;
Rivera, Marcela ;
Aguilera, Inmaculada ;
Agis, David ;
Bouso, Laura ;
Elosua, Roberto ;
Foraster, Maria ;
de Nazelle, Audrey ;
Nieuwenhuijsen, Mark ;
Vila, Joan ;
Kuenzli, Nino .
ATMOSPHERIC ENVIRONMENT, 2012, 54 :634-642
[7]   National Spatiotemporal Exposure Surface for NO2: Monthly Scaling of a Satellite-Derived Land-Use Regression, 2000-2010 [J].
Bechle, Matthew J. ;
Millet, Dylan B. ;
Marshall, Julian D. .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2015, 49 (20) :12297-12305
[8]   Remote sensing of exposure to NO2: Satellite versus ground-based measurement in a large urban [J].
Bechle, Matthew J. ;
Millet, Dylan B. ;
Marshall, Julian D. .
ATMOSPHERIC ENVIRONMENT, 2013, 69 :345-353
[9]   Application of the deletion/substitution/addition algorithm to selecting land use regression models for interpolating air pollution measurements in California [J].
Beckerman, Bernardo S. ;
Jerrett, Michael ;
Martin, Randall V. ;
van Donkelaar, Aaron ;
Ross, Zev ;
Burnett, Richard T. .
ATMOSPHERIC ENVIRONMENT, 2013, 77 :172-177
[10]   Effects of long-term exposure to air pollution on natural-cause mortality: an analysis of 22 European cohorts within the multicentre ESCAPE project [J].
Beelen, Rob ;
Raaschou-Nielsen, Ole ;
Stafoggia, Massimo ;
Andersen, Zorana Jovanovic ;
Weinmayr, Gudrun ;
Hoffmann, Barbara ;
Wolf, Kathrin ;
Samoli, Evangelia ;
Fischer, Paul ;
Nieuwenhuijsen, Mark ;
Vineis, Paolo ;
Xun, Wei W. ;
Katsouyanni, Klea ;
Dimakopoulou, Konstantina ;
Oudin, Anna ;
Forsberg, Bertil ;
Modig, Lars ;
Havulinna, Aki S. ;
Lanki, Timo ;
Turunen, Anu ;
Oftedal, Bente ;
Nystad, Wenche ;
Nafstad, Per ;
De Faire, Ulf ;
Pedersen, Nancy L. ;
Ostenson, Claes-Goeran ;
Fratiglioni, Laura ;
Penell, Johanna ;
Korek, Michal ;
Pershagen, Goeran ;
Eriksen, Kirsten Thorup ;
Overvad, Kim ;
Ellermann, Thomas ;
Eeftens, Marloes ;
Peeters, Petra H. ;
Meliefste, Kees ;
Wang, Meng ;
Bueno-de-Mesquita, Bas ;
Sugiri, Dorothea ;
Kraemer, Ursula ;
Heinrich, Joachim ;
de Hoogh, Kees ;
Key, Timothy ;
Peters, Annette ;
Hampel, Regina ;
Concin, Hans ;
Nagel, Gabriele ;
Ineichen, Alex ;
Schaffner, Emmanuel ;
Probst-Hensch, Nicole .
LANCET, 2014, 383 (9919) :785-795