Mitigating the impact of measurement error when using penalized regression to model exposure in two-stage air pollution epidemiology studies

被引:11
作者
Bergen, Silas [1 ]
Szpiro, Adam A. [2 ]
机构
[1] Winona State Univ, Winona, MN 55987 USA
[2] Univ Washington, Seattle, WA 98195 USA
关键词
Measurement error; Penalized regression; PM2.5; Systolic blood pressure; Two-stage modeling; LEAST-SQUARES; ESTIMATOR;
D O I
10.1007/s10651-015-0314-y
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Air pollution epidemiology studies often implement a two-stage approach. Exposure models are built using observed monitoring data to predict exposure at participant locations where the true exposure is unobserved, and the predictions used to estimate the health effect. This induces measurement error which may bias the estimated health effect and affect its standard error. The impact of measurement error depends on assumed data generating mechanisms and the approach used to estimate and predict exposure. A paradigm wherein the exposure surface is fixed and the subject and monitoring locations are random has been previously motivated, but corresponding measurement error methods exist only when modeling exposure with simple, low-rank, unpenalized regression splines. We develop a comprehensive treatment of measurement error when modeling exposure with high-but-fixed-rank penalized regression splines. If sufficiently rich, these models well-approximate full-rank methods such as universal kriging while remaining asymptotically tractable. We describe the implications of penalization for measurement error, motivate choosing the penalty to optimize health effect inference, derive an asymptotic bias correction, and provide a simple non-parametric bootstrap to account for all sources of variability. We find that highly parameterizing the exposure model results in severely biased and inefficient health effect inference if no penalty is used. Choosing the penalty to mitigate measurement error yields much less bias and better efficiency, and can lead to better confidence interval coverage than other common penalty selection methods. Combining the bias correction with the non-parametric bootstrap yields accurate coverage of nominal 95 % confidence intervals.
引用
收藏
页码:601 / 631
页数:31
相关论文
共 29 条
[1]   Partial least squares regression and projection on latent structure regression (PLS Regression) [J].
Abdi, Herve .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (01) :97-106
[2]  
[Anonymous], 2003, Semiparametric Regression
[3]   A National Prediction Model for PM2.5 Component Exposures and Measurement Error-Corrected Health Effect Inference [J].
Bergen, Silas ;
Sheppard, Lianne ;
Sampson, Paul D. ;
Kim, Sun-Young ;
Richards, Mark ;
Vedal, Sverre ;
Kaufman, Joel D. ;
Szpiro, Adam A. .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2013, 121 (09) :1017-1025
[4]  
Carroll R.J., 2006, Measurement Error in Nonlinear Model
[5]   Does Exposure Prediction Bias Health-Effect Estimation? The Relationship Between Confounding Adjustment and Exposure Prediction [J].
Cefalu, Matthew ;
Dominici, Francesca .
EPIDEMIOLOGY, 2014, 25 (04) :583-590
[6]  
Chan S, ENV HLTH PERSPECT
[7]  
Cressie N, 1993, STAT SPATIAL DATA, DOI [10.1002/9781119115151, DOI 10.1002/9781119115151]
[8]   Fixed rank kriging for very large spatial data sets [J].
Cressie, Noel ;
Johannesson, Gardar .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :209-226
[9]  
Efron B., 1993, Monographs Stat. Appl. Prob., V57, P202
[10]  
Green PJ., 1994, NONPARAMETRIC REGRES