Prediction and model evaluation for space-time data

被引:3
作者
Watson, G. L. [1 ]
Reid, C. E. [2 ]
Jerrett, M. [3 ]
Telesca, D. [1 ,4 ]
机构
[1] Univ Calif Los Angeles, Dept Biostat, Los Angeles, CA USA
[2] Univ Colorado, Dept Geog, Boulder, CO USA
[3] Univ Calif Los Angeles, Dept Environm Hlth Sci, Los Angeles, CA USA
[4] UCLA Fielding Sch Publ Hlth, Box 177220,Suite 51-254 CHS, Los Angeles, CA 90095 USA
关键词
Cross validation; generalization error; machine learning; point process; space-time data; FINE PARTICULATE MATTER; CROSS-VALIDATION; SPATIOTEMPORAL PREDICTION; PM2.5; CONCENTRATIONS; WEIGHTED REGRESSION; MEASUREMENT ERROR; SELECTION;
D O I
10.1080/02664763.2023.2252208
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.
引用
收藏
页码:2007 / 2024
页数:18
相关论文
共 35 条
  • [31] Measurement error in two-stage analyses, with application to air pollution epidemiology
    Szpiro, Adam A.
    Paciorek, Christopher J.
    [J]. ENVIRONMETRICS, 2013, 24 (08) : 501 - 517
  • [32] Super learner
    van der Laan, Mark J.
    Polley, Eric C.
    Hubbard, Alan E.
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2007, 6
  • [33] High-Resolution Satellite-Derived PM2.5 from Optimal Estimation and Geographically Weighted Regression over North America
    van Donkelaar, Aaron
    Martin, Randall V.
    Spurr, Robert J. D.
    Burnett, Richard T.
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2015, 49 (17) : 10482 - 10491
  • [34] Machine learning models accurately predict ozone exposure during wildfire events
    Watson, Gregory L.
    Telesca, Donatello
    Reid, Colleen E.
    Pfister, Gabriele G.
    Jerrett, Michael
    [J]. ENVIRONMENTAL POLLUTION, 2019, 254
  • [35] ZHANG P, 1995, SCAND J STAT, V22, P83