Cross-Validation With Confidence

被引:56
作者
Lei, Jing [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
关键词
Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;
D O I
10.1080/01621459.2019.1672556
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
引用
收藏
页码:1978 / 1997
页数:20
相关论文
共 50 条
[41]   Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction [J].
Gianola, Daniel ;
Schoen, Chris-Carolin .
G3-GENES GENOMES GENETICS, 2016, 6 (10) :3107-3128
[42]   Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context [J].
Martinez, Josue G. ;
Carroll, Raymond J. ;
Mueller, Samuel ;
Sampson, Joshua N. ;
Chatterjee, Nilanjan .
AMERICAN STATISTICIAN, 2011, 65 (04) :223-228
[43]   K-fold cross-validation for complex sample surveys [J].
Wieczorek, Jerzy ;
Guerin, Cole ;
McMahon, Thomas .
STAT, 2022, 11 (01)
[44]   Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines [J].
Varoquaux, Gael ;
Raamana, Pradeep Reddy ;
Engemann, Denis A. ;
Hoyos-Idrobo, Andres ;
Schwartz, Yannick ;
Thirion, Bertrand .
NEUROIMAGE, 2017, 145 :166-179
[45]   Cross-validation for change-point regression: Pitfalls and solutions [J].
Pein, Florian ;
Shah, Rajen d. .
BERNOULLI, 2025, 31 (01) :388-411
[46]   An efficient variance estimator for cross-validation under partition sampling [J].
Wang, Qing ;
Cai, Xizhen .
STATISTICS, 2021, 55 (03) :660-681
[47]   Concentration inequalities of the cross-validation estimator for empirical risk minimizer [J].
Cornec, Matthieu .
STATISTICS, 2017, 51 (01) :43-60
[48]   Minimization and estimation of the variance of prediction errors for cross-validation designs [J].
Fuchs, Mathias ;
Krautenbacher, Norbert .
JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2016, 10 (02) :420-443
[49]   Network Cross-Validation for Determining the Number of Communities in Network Data [J].
Chen, Kehui ;
Lei, Jing .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (521) :241-251
[50]   A Universal Approximate Cross-Validation Criterion for Regular Risk Functions [J].
Commenges, Daniel ;
Proust-Lima, Cecile ;
Samieri, Cecilia ;
Liquet, Benoit .
INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2015, 11 (01) :51-67