Prediction and model evaluation for space-time data

被引：3

作者：

Watson, G. L. ^{[1
]}

Reid, C. E. ^{[2
]}

Jerrett, M. ^{[3
]}

Telesca, D. ^{[1
,4
]}

机构：

[1] Univ Calif Los Angeles, Dept Biostat, Los Angeles, CA USA

[2] Univ Colorado, Dept Geog, Boulder, CO USA

[3] Univ Calif Los Angeles, Dept Environm Hlth Sci, Los Angeles, CA USA

[4] UCLA Fielding Sch Publ Hlth, Box 177220,Suite 51-254 CHS, Los Angeles, CA 90095 USA

来源：

JOURNAL OF APPLIED STATISTICS | 2024年 / 51卷 / 10期

关键词：

Cross validation; generalization error; machine learning; point process; space-time data; FINE PARTICULATE MATTER; CROSS-VALIDATION; SPATIOTEMPORAL PREDICTION; PM2.5; CONCENTRATIONS; WEIGHTED REGRESSION; MEASUREMENT ERROR; SELECTION;

D O I：

10.1080/02664763.2023.2252208

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.

引用

页码：2007 / 2024

页数：18

共 35 条

[1] A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5 concentration
Berrocal, Veronica J.
Guan, Yawen
Muyskens, Amanda
Wang, Haoyu
Reich, Brian J.
Mulholland, James A.
Chang, Howard H.
[J]. ATMOSPHERIC ENVIRONMENT, 2020, 222
[2] A Spatio-Temporal Downscaler for Output From Numerical Models
Berrocal, Veronica J.
Gelfand, Alan E.
Holland, David M.
[J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2010, 15 (02) : 176 - 197
[3] SOME ASYMPTOTIC THEORY FOR THE BOOTSTRAP
BICKEL, PJ
FREEDMAN, DA
[J]. ANNALS OF STATISTICS, 1981, 9 (06) : 1196 - 1217
[4] Predicting Daily Urban Fine Particulate Matter Concentrations Using a Random Forest Model
Brokamp, Cole
Jandarov, Roman
Hossain, Monir
Ryan, Patrick
[J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2018, 52 (07) : 4173 - 4179
[5] Cressie N., 2015, STAT SPATIAL DATA, DOI DOI 10.1002/9781119115151
[6] Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets
Datta, Abhirup
Banerjee, Sudipto
Finley, Andrew O.
Gelfand, Alan E.
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) : 800 - 812
[7] Geostatistical inference under preferential sampling
Diggle, Peter J.
Menezes, Raquel
Su, Ting-li
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2010, 59 : 191 - 232
[8] DUDOIT S., 2005, Statistical Methodology, V2, P131, DOI DOI 10.1016/J.STAMET.2005.02.003
[9] Improvements on cross-validation: The .632+ bootstrap method
Efron, B
Tibshirani, R
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) : 548 - 560
[10] Efron B, 2004, J AM STAT ASSOC, V99, P619, DOI 10.1198/016214504000000692

← 1 2 3 4 →