On the effect of preferential sampling in spatial?prediction

被引:52
作者
Gelfand, Alan E. [2 ]
Sahu, Sujit K. [1 ]
Holland, David M. [3 ]
机构
[1] Univ Southampton, Southampton Stat Sci Res Inst, Math Acad Unit, Southampton, Hants, England
[2] Duke Univ, Inst Stat & Decis, Durham, NC USA
[3] US EPA, Natl Exposure Res Lab, Res Triangle Pk, NC 27711 USA
关键词
fitting model; hierarchical model; informative covariate; intensity; sampling model; spatial point pattern;
D O I
10.1002/env.2169
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The choice of the sampling locations in a spatial network is often guided by practical demands. In particular, many locations are preferentially chosen to capture high values of a response, for example, air pollution levels in environmental monitoring. Then, model estimation and prediction of the exposure surface become biased because of the selective sampling. As prediction is often the main utility of the modeling, we suggest that the effect of preferential sampling lies more importantly in the resulting predictive surface than in parameter estimation. We take demonstration of this effect as our?focus. In particular, our contribution is to offer a direct simulation-based approach to assessing the effects of preferential sampling. We compare two predictive surfaces over the study region, one originating from the notion of an operating intensity, driving the selection of monitoring sites, the other under complete spatial randomness. We can consider a range of response models. They may reflect the operating intensity, introduce alternative informative covariates, or just propose a flexible spatial model. Then, we can generate data under the given model. Upon fitting the model and interpolating (kriging), we will obtain two predictive surfaces to compare with the known truth. It is important to note that we need suitable metrics to compare the surfaces and that the predictive surfaces are random, so we need to make expected comparisons. We also present an examination of real data using ozone exposures. Here, what we can show is that, within a given network, there can be substantial differences in the spatial prediction using preferentially chosen locations versus roughly randomly selected locations and that the latter provide much improved predictive validation. Copyright (c) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:565 / 578
页数:14
相关论文
共 22 条
  • [1] Banerjee S., 2003, Hierarchical modeling and analysis for spatial data
  • [2] Stationary process approximation for the analysis of large spatial datasets
    Banerjee, Sudipto
    Gelfand, Alan E.
    Finley, Andrew O.
    Sang, Huiyan
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 825 - 848
  • [3] MODELING LARGE SCALE SPECIES ABUNDANCE WITH LATENT SPATIAL PROCESSES
    Chakraborty, Avishek
    Gelfand, Alan E.
    Wilson, Adam M.
    Latimer, Andrew M.
    Silander, John A., Jr.
    [J]. ANNALS OF APPLIED STATISTICS, 2010, 4 (03) : 1403 - 1429
  • [4] Hierarchical space-time modelling of PM10 pollution
    Cocchi, Daniela
    Greco, Fedele
    Trivisano, Carlo
    [J]. ATMOSPHERIC ENVIRONMENT, 2007, 41 (03) : 532 - 542
  • [5] Cressie N., 2011, WILEY SERIES PROBABI
  • [6] Dawid P., 1984, J R STAT SOC A GEN, V147, P278
  • [7] Diggle P., 2014, STAT ANAL SPATIAL SP
  • [8] Geostatistical inference under preferential sampling
    Diggle, Peter J.
    Menezes, Raquel
    Su, Ting-li
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2010, 59 : 191 - 232
  • [9] Predicting the spatial distribution of ground flora on large domains using a hierarchical Bayesian model
    Hooten, MB
    Larsen, DR
    Wikle, CK
    [J]. LANDSCAPE ECOLOGY, 2003, 18 (05) : 487 - 502
  • [10] Illian Janine, 2008, STAT ANAL MODELLING