Fast Calculation of Gaussian Process Multiple-Fold Cross-Validation Residuals and their Covariances

被引：1

作者：

Ginsbourger, David ^{[1
]}

Schaerer, Cedric ^{[1
]}

机构：

[1] Univ Bern, Dept Math & Stat, Bern, Switzerland

来源：

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS | 2025年 / 34卷 / 01期

基金：

瑞士国家科学基金会;

关键词：

Cross-validation; Diagnostics; Gaussian process; Hyperparameter estimation; Universal kriging; Woodbury formula; COMPUTER EXPERIMENTS; STRATEGIES;

D O I：

10.1080/10618600.2024.2353633

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We generalize fast Gaussian process leave-one-out formulas to multiple-fold cross-validation, highlighting in turn the covariance structure of cross-validation residuals in simple and universal kriging frameworks. We illustrate how resulting covariances affect model diagnostics. We further establish in the case of noiseless observations that correcting for covariances between residuals in cross-validation-based estimation of the scale parameter leads back to maximum likelihood estimation. Also, we highlight in broader settings how differences between pseudo-likelihood and likelihood methods boil down to accounting or not for residual covariances. The proposed fast calculation of cross-validation residuals is implemented and benchmarked against a naive implementation, all in R. Numerical experiments highlight the substantial speed-ups that our approach enables. However, as supported by a discussion on main drivers of computational costs and by a numerical benchmark, speed-ups steeply decline as the number of folds (say, all sharing the same size) decreases. An application to a contaminant localization test case illustrates that the way of grouping observations in folds may affect model assessment and parameter fitting compared to leave-one-out. Overall, our results enable fast multiple-fold cross-validation, have consequences in model diagnostics, and pave the way to future work on hyperparameter fitting as well as on goal-oriented fold design. Supplementary materials for this article are available online.

引用

页码：1 / 14

页数：14

共 50 条

[41] Performance improvement of empirical models for estimation of global solar radiation in India: A k-fold cross-validation approach
Saud, Sheikh
Jamil, Basharat
Upadhyay, Yogesh
Irshad, Kashif
SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2020, 40
[42] A fast cross-validation method for alignment of electron tomography images based on Beer-Lambert law
Yan, Rui
Edwards, Thomas J.
Pankratz, Logan M.
Kuhn, Richard J.
Lanman, Jason K.
Liu, Jun
Jiang, Wen
JOURNAL OF STRUCTURAL BIOLOGY, 2015, 192 (02) : 297 - 306
[43] The five-factor model of the Positive and Negative Syndrome Scale - II: A ten-fold cross-validation of a revised model
van der Gaag, Mark
Hoffman, Tonko
Remijsen, Mila
Hijman, Ron
de Haan, Lieuwe
van Meijel, Berno
van Harten, Peter N.
Valmaggia, Lucia
de Hert, Marc
Cuijpers, Anke
Wiersma, Durk
SCHIZOPHRENIA RESEARCH, 2006, 85 (1-3) : 280 - 287
[44] Augmented RBF metamodel for global sensitivity analysis enhanced by recursive evolution LHD and efficient K-fold cross-validation
Li, Guosheng
Yang, Jiawei
Wang, Wenjie
Zhang, Zixuan
Zhang, Weihua
Wu, Zeping
JOURNAL OF MECHANICAL SCIENCE AND TECHNOLOGY, 2022, 36 (08) : 4127 - 4142
[45] Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation
Xiong, Zheng
Cui, Yuxin
Liu, Zhonghao
Zhao, Yong
Hu, Ming
Hu, Jianjun
COMPUTATIONAL MATERIALS SCIENCE, 2020, 171
[46] Augmented RBF metamodel for global sensitivity analysis enhanced by recursive evolution LHD and efficient K-fold cross-validation
Guosheng Li
Jiawei Yang
Wenjie Wang
Zixuan Zhang
Weihua Zhang
Zeping Wu
Journal of Mechanical Science and Technology, 2022, 36 : 4127 - 4142
[47] Cross-validation and ensemble analyses on multiple-criteria linear programming classification for credit cardholder behavior
Peng, Y
Kou, G
Chen, ZX
Shi, Y
COMPUTATIONAL SCIENCE - ICCS 2004, PROCEEDINGS, 2004, 3039 : 931 - 939
[48] Fast exact leave-one-out cross-validation of sparse least-squares support vector machines
Cawley, GC
Talbot, NLC
NEURAL NETWORKS, 2004, 17 (10) : 1467 - 1475
[49] Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation
Myles, AJ
Murray, AF
Wallace, AR
Barnard, J
Smith, G
NEURAL COMPUTING & APPLICATIONS, 1997, 5 (03) : 134 - 151
[50] Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation
Andrew J. Myles
Alan F. Murray
A. Robin Wallace
John Barnard
Gordon Smith
Neural Computing & Applications, 1997, 5 : 134 - 151

← 1 2 3 4 5 →