Cross-Validation With Confidence

被引：56

作者：

Lei, Jing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2020年 / 115卷 / 532期

关键词：

Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;

D O I：

10.1080/01621459.2019.1672556

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

引用

页码：1978 / 1997

页数：20

共 50 条

[41] Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction [J].

Gianola, Daniel ;

Schoen, Chris-Carolin .

G3-GENES GENOMES GENETICS, 2016, 6 (10) :3107-3128

[42] Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context [J].

Martinez, Josue G. ;

Carroll, Raymond J. ;

Mueller, Samuel ;

Sampson, Joshua N. ;

Chatterjee, Nilanjan .

AMERICAN STATISTICIAN, 2011, 65 (04) :223-228

[43] K-fold cross-validation for complex sample surveys [J].

Wieczorek, Jerzy ;

Guerin, Cole ;

McMahon, Thomas .

STAT, 2022, 11 (01)

[44] Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines [J].

Varoquaux, Gael ;

Raamana, Pradeep Reddy ;

Engemann, Denis A. ;

Hoyos-Idrobo, Andres ;

Schwartz, Yannick ;

Thirion, Bertrand .

NEUROIMAGE, 2017, 145 :166-179

[45] Cross-validation for change-point regression: Pitfalls and solutions [J].

Pein, Florian ;

Shah, Rajen d. .

BERNOULLI, 2025, 31 (01) :388-411

[46] An efficient variance estimator for cross-validation under partition sampling [J].

Wang, Qing ;

Cai, Xizhen .

STATISTICS, 2021, 55 (03) :660-681

[47] Concentration inequalities of the cross-validation estimator for empirical risk minimizer [J].

Cornec, Matthieu .

STATISTICS, 2017, 51 (01) :43-60

[48] Minimization and estimation of the variance of prediction errors for cross-validation designs [J].

Fuchs, Mathias ;

Krautenbacher, Norbert .

JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2016, 10 (02) :420-443

[49] Network Cross-Validation for Determining the Number of Communities in Network Data [J].

Chen, Kehui ;

Lei, Jing .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (521) :241-251

[50] A Universal Approximate Cross-Validation Criterion for Regular Risk Functions [J].

Commenges, Daniel ;

Proust-Lima, Cecile ;

Samieri, Cecilia ;

Liquet, Benoit .

INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2015, 11 (01) :51-67

← 1 2 3 4 5 →