Cross-Validation With Confidence

被引：56

作者：

Lei, Jing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2020年 / 115卷 / 532期

关键词：

Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;

D O I：

10.1080/01621459.2019.1672556

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

引用

页码：1978 / 1997

页数：20

共 50 条

[21] Estimation of density functionals via cross-validation [J].

Chacon, Jose E. ;

Tenreiro, Carlos .

STATISTICA NEERLANDICA, 2024, 78 (04) :743-758

[22] Oracle inequalities for cross-validation type procedures [J].

Lecue, Guillaume ;

Mitchell, Charles .

ELECTRONIC JOURNAL OF STATISTICS, 2012, 6 :1803-1837

[23] Estimating the Number of Clusters Using Cross-Validation [J].

Fu, Wei ;

Perry, Patrick O. .

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) :162-173

[24] Confidence intervals for the Cox model test error from cross-validation [J].

Sun, Min Woo ;

Tibshirani, Robert .

STATISTICS IN MEDICINE, 2023, 42 (25) :4532-4541

[25] Generalised correlated cross-validation [J].

Carmack, Patrick S. ;

Spence, Jeffrey S. ;

Schucany, William R. .

JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (02) :269-282

[26] Cross-Validation for Correlated Data [J].

Rabinowicz, Assaf ;

Rosset, Saharon .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (538) :718-731

[27] Segmentation of the mean of heteroscedastic data via cross-validation [J].

Sylvain Arlot ;

Alain Celisse .

Statistics and Computing, 2011, 21 :613-632

[28] Best subset selection via cross-validation criterion [J].

Takano, Yuichi ;

Miyashiro, Ryuhei .

TOP, 2020, 28 (02) :475-488

[29] Cross-validation to select Bayesian hierarchical models in phylogenetics [J].

Sebastián Duchêne ;

David A. Duchêne ;

Francesca Di Giallonardo ;

John-Sebastian Eden ;

Jemma L. Geoghegan ;

Kathryn E. Holt ;

Simon Y. W. Ho ;

Edward C. Holmes .

BMC Evolutionary Biology, 16

[30] Segmentation of the mean of heteroscedastic data via cross-validation [J].

Arlot, Sylvain ;

Celisse, Alain .

STATISTICS AND COMPUTING, 2011, 21 (04) :613-632

← 1 2 3 4 5 →