Cross-Validation With Confidence

被引:56
作者
Lei, Jing [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
关键词
Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;
D O I
10.1080/01621459.2019.1672556
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
引用
收藏
页码:1978 / 1997
页数:20
相关论文
共 50 条
[21]   Estimation of density functionals via cross-validation [J].
Chacon, Jose E. ;
Tenreiro, Carlos .
STATISTICA NEERLANDICA, 2024, 78 (04) :743-758
[22]   Oracle inequalities for cross-validation type procedures [J].
Lecue, Guillaume ;
Mitchell, Charles .
ELECTRONIC JOURNAL OF STATISTICS, 2012, 6 :1803-1837
[23]   Estimating the Number of Clusters Using Cross-Validation [J].
Fu, Wei ;
Perry, Patrick O. .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) :162-173
[24]   Confidence intervals for the Cox model test error from cross-validation [J].
Sun, Min Woo ;
Tibshirani, Robert .
STATISTICS IN MEDICINE, 2023, 42 (25) :4532-4541
[25]   Generalised correlated cross-validation [J].
Carmack, Patrick S. ;
Spence, Jeffrey S. ;
Schucany, William R. .
JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (02) :269-282
[26]   Cross-Validation for Correlated Data [J].
Rabinowicz, Assaf ;
Rosset, Saharon .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (538) :718-731
[27]   Segmentation of the mean of heteroscedastic data via cross-validation [J].
Sylvain Arlot ;
Alain Celisse .
Statistics and Computing, 2011, 21 :613-632
[28]   Best subset selection via cross-validation criterion [J].
Takano, Yuichi ;
Miyashiro, Ryuhei .
TOP, 2020, 28 (02) :475-488
[29]   Cross-validation to select Bayesian hierarchical models in phylogenetics [J].
Sebastián Duchêne ;
David A. Duchêne ;
Francesca Di Giallonardo ;
John-Sebastian Eden ;
Jemma L. Geoghegan ;
Kathryn E. Holt ;
Simon Y. W. Ho ;
Edward C. Holmes .
BMC Evolutionary Biology, 16
[30]   Segmentation of the mean of heteroscedastic data via cross-validation [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS AND COMPUTING, 2011, 21 (04) :613-632