Effect of the number of measurement sites on land use regression models in estimating local air pollution

被引:138
作者
Basagana, Xavier [1 ,2 ,3 ]
Rivera, Marcela [1 ,2 ,3 ,4 ]
Aguilera, Inmaculada [1 ,2 ,3 ]
Agis, David [1 ,2 ,3 ]
Bouso, Laura [1 ,2 ,3 ]
Elosua, Roberto [2 ,3 ]
Foraster, Maria [1 ,2 ,3 ,4 ]
de Nazelle, Audrey [1 ,2 ,3 ]
Nieuwenhuijsen, Mark [1 ,2 ,3 ]
Vila, Joan [2 ,3 ]
Kuenzli, Nino [4 ,5 ,6 ]
机构
[1] Ctr Res Environm Epidemiol CREAL, Barcelona 08003, Catalonia, Spain
[2] IMIM Hosp del Mar Res Inst, Barcelona, Spain
[3] CIBERESP, Barcelona, Spain
[4] UPF, Barcelona, Spain
[5] Swiss Trop & Publ Hlth Inst, Basel, Switzerland
[6] Univ Basel, Basel, Switzerland
关键词
Land use regression; Measurement error; Modeling; NO2; Residential exposure; Spain;
D O I
10.1016/j.atmosenv.2012.01.064
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Land use regression (LUR) models are often used in epidemiologic studies to predict the air pollution exposure of health study participants. Such models are often based on a small to moderate number of air pollution measurement sites across the study area, and on a set of variables characterizing factors such as traffic patterns and surrounding land uses that are used as potential predictors. We used resampling techniques on a set of 148 measurement sites of NO2 in the urban area of Girona (Spain) to investigate the effect of the number of measurement sites on the LUR model performance, in particular on predictive ability and on the variables being chosen in the final model. In addition, we investigated the effect of the number of potential predictors and the variable selection algorithm used, and the consequences of the use of LUR predictions in regression models for a health outcome. Our results showed that, especially in small samples, both the adjusted within-sample R-2 and the leave-one-out cross-validation R-2 tended to give highly inflated values when compared to their prediction ability in a validation dataset. When the number of potential predictors was high, LUR models developed with a small number of measurement sites tended to give higher within-sample and cross-validated R-2 than those developed with more sites. Validation dataset R-2 showed a poor performance of models developed with a small number of sites that improved as the number of sites increased. Models developed with a small number of sites tended to select a different set of variables every time, were very sensitive to the number of potential predictors offered and resulted in stronger attenuation of coefficients when air pollution predictions were used in a health model. Our results suggest that LUR models aimed at characterizing local air pollution levels in complex urban settings should be based on a large number of measurement sites (>80 in our setting) and that the set of potential predictors should be restricted. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:634 / 642
页数:9
相关论文
共 21 条
  • [1] [Anonymous], 2001, The elements of statistical learning: data mining, inference and prediction
  • [2] Carroll R. J., 2006, MEASUREMENT ERROR NO
  • [3] Davison A.C. Hinkley., 1997, BOOTSTRAP METHODS TH
  • [4] BACKWARD, FORWARD AND STEPWISE AUTOMATED SUBSET-SELECTION ALGORITHMS - FREQUENCY OF OBTAINING AUTHENTIC AND NOISE VARIABLES
    DERKSEN, S
    KESELMAN, HJ
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1992, 45 : 265 - 282
  • [5] FREQUENCY OF SELECTING NOISE VARIABLES IN SUBSET REGRESSION-ANALYSIS - A SIMULATION STUDY
    FLACK, VF
    CHANG, PC
    [J]. AMERICAN STATISTICIAN, 1987, 41 (01) : 84 - 86
  • [6] Harrell FE, 1996, STAT MED, V15, P361, DOI 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO
  • [7] 2-4
  • [8] Harrell FE., 2001, Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis, V608, DOI DOI 10.2147/
  • [9] Assessing model fit by cross-validation
    Hawkins, DM
    Basak, SC
    Mills, D
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02): : 579 - 586
  • [10] The problem of overfitting
    Hawkins, DM
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01): : 1 - 12