Fast Calculation of Gaussian Process Multiple-Fold Cross-Validation Residuals and their Covariances

被引:1
作者
Ginsbourger, David [1 ]
Schaerer, Cedric [1 ]
机构
[1] Univ Bern, Dept Math & Stat, Bern, Switzerland
基金
瑞士国家科学基金会;
关键词
Cross-validation; Diagnostics; Gaussian process; Hyperparameter estimation; Universal kriging; Woodbury formula; COMPUTER EXPERIMENTS; STRATEGIES;
D O I
10.1080/10618600.2024.2353633
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We generalize fast Gaussian process leave-one-out formulas to multiple-fold cross-validation, highlighting in turn the covariance structure of cross-validation residuals in simple and universal kriging frameworks. We illustrate how resulting covariances affect model diagnostics. We further establish in the case of noiseless observations that correcting for covariances between residuals in cross-validation-based estimation of the scale parameter leads back to maximum likelihood estimation. Also, we highlight in broader settings how differences between pseudo-likelihood and likelihood methods boil down to accounting or not for residual covariances. The proposed fast calculation of cross-validation residuals is implemented and benchmarked against a naive implementation, all in R. Numerical experiments highlight the substantial speed-ups that our approach enables. However, as supported by a discussion on main drivers of computational costs and by a numerical benchmark, speed-ups steeply decline as the number of folds (say, all sharing the same size) decreases. An application to a contaminant localization test case illustrates that the way of grouping observations in folds may affect model assessment and parameter fitting compared to leave-one-out. Overall, our results enable fast multiple-fold cross-validation, have consequences in model diagnostics, and pave the way to future work on hyperparameter fitting as well as on goal-oriented fold design. Supplementary materials for this article are available online.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [1] Multiple predicting K-fold cross-validation for model selection
    Jung, Yoonsuh
    JOURNAL OF NONPARAMETRIC STATISTICS, 2018, 30 (01) : 197 - 215
  • [2] A computationally fast alternative to cross-validation in penalized Gaussian graphical models
    Vujacic, Ivan
    Abbruzzo, Antonino
    Wit, Ernst
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (18) : 3628 - 3640
  • [3] Parallel cross-validation: A scalable fitting method for Gaussian process models
    Gerber, Florian
    Nychka, Douglas W.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 155
  • [4] Fast cross-validation in harmonic approximation
    Bartel, Felix
    Hielscher, Ralf
    Potts, Daniel
    APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2020, 49 (02) : 415 - 437
  • [5] A K-fold averaging cross-validation procedure
    Jung, Yoonsuh
    Hu, Jianhua
    JOURNAL OF NONPARAMETRIC STATISTICS, 2015, 27 (02) : 167 - 179
  • [6] Stratified Cross-Validation on Multiple Columns
    Motl, Jan
    Kordik, Pavel
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 26 - 31
  • [7] Gaussian Mixture Optimization Based on Efficient Cross-Validation
    Shinozaki, Takahiro
    Furui, Sadaoki
    Kawahara, Tatsuya
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (03) : 540 - 547
  • [8] Fast Cross-Validation via Sequential Testing
    Krueger, Tammo
    Panknin, Danny
    Braun, Mikio
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1103 - 1155
  • [9] No unbiased estimator of the variance of K-fold cross-validation
    Bengio, Y
    Grandvalet, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 5 : 1089 - 1105
  • [10] K-fold cross-validation for complex sample surveys
    Wieczorek, Jerzy
    Guerin, Cole
    McMahon, Thomas
    STAT, 2022, 11 (01):