A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression

被引:4
作者
Liland, Kristian Hovde [1 ]
Skogholt, Joakim [1 ]
Indahl, Ulf Geir [1 ]
机构
[1] Norwegian Univ Life Sci, Fac Sci & Technol, N-1432 As, Norway
关键词
Cross-validation; GCV; PRESS statistic; ridge regression; SVD; Tikhonov regularisation; SPECTROSCOPY; PREDICTION; MILK;
D O I
10.1109/ACCESS.2024.3357097
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares ( PRESS ) statistic. We also suggest strategies for efficient estimation of the minimum PRESS value and full PRESS function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.
引用
收藏
页码:17349 / 17368
页数:20
相关论文
共 34 条
  • [1] Extended multiplicative signal correction in vibrational spectroscopy, a tutorial
    Afseth, Nils Kristian
    Kohler, Achim
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2012, 117 : 92 - 99
  • [2] The potential of Raman spectroscopy for characterisation of the fatty acid unsaturation of salmon
    Afseth, Nils Kristian
    Wold, Jens Petter
    Segtnan, Vegard Herman
    [J]. ANALYTICA CHIMICA ACTA, 2006, 572 (01) : 85 - 92
  • [3] Predicting the Fatty Acid Composition of Milk: A Comparison of Two Fourier Transform Infrared Sampling Techniques
    Afseth, Nils Kristian
    Martens, Harald
    Randby, Ashild
    Gidskehaug, Lars
    Narum, Bjorg
    Jorgensen, Kjetil
    Lien, Sigbjorn
    Kohler, Achim
    [J]. APPLIED SPECTROSCOPY, 2010, 64 (07) : 700 - 707
  • [4] RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION
    ALLEN, DM
    [J]. TECHNOMETRICS, 1974, 16 (01) : 125 - 127
  • [5] MEAN SQUARE ERROR OF PREDICTION AS A CRITERION FOR SELECTING VARIABLES
    ALLEN, DM
    [J]. TECHNOMETRICS, 1971, 13 (03) : 469 - &
  • [6] Cross-Validation: What Does It Estimate and How Well Does It Do It?
    Bates, Stephen
    Hastie, Trevor
    Tibshirani, Robert
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1434 - 1445
  • [7] Improved Small-Sample Estimation of Nonlinear Cross-Validated Prediction Metrics
    Benkeser, David
    Petersen, Maya
    van der Laan, Mark J.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (532) : 1917 - 1932
  • [8] Bjorck A., 2016, Numerical Methods in Matrix Computations
  • [9] Brent R.P, 1973, ALGORITHMS MINIMIZAT, P4
  • [10] Critical factors limiting the interpretation of regression vectors in multivariate calibration
    Brown, Christopher D.
    Green, Robert L.
    [J]. TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2009, 28 (04) : 506 - 514