Cross-Validation With Confidence

被引:47
作者
Lei, Jing [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
关键词
Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;
D O I
10.1080/01621459.2019.1672556
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
引用
收藏
页码:1978 / 1997
页数:20
相关论文
共 50 条
  • [21] Estimating the Number of Clusters Using Cross-Validation
    Fu, Wei
    Perry, Patrick O.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) : 162 - 173
  • [22] Estimation of density functionals via cross-validation
    Chacon, Jose E.
    Tenreiro, Carlos
    STATISTICA NEERLANDICA, 2024, 78 (04) : 743 - 758
  • [23] Cross-validation approaches for penalized Cox regression
    Dai, Biyue
    Breheny, Patrick
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2024, 33 (04) : 702 - 715
  • [24] Generalised correlated cross-validation
    Carmack, Patrick S.
    Spence, Jeffrey S.
    Schucany, William R.
    JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (02) : 269 - 282
  • [25] Confidence intervals for the Cox model test error from cross-validation
    Sun, Min Woo
    Tibshirani, Robert
    STATISTICS IN MEDICINE, 2023, 42 (25) : 4532 - 4541
  • [26] Cross-Validation for Correlated Data
    Rabinowicz, Assaf
    Rosset, Saharon
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (538) : 718 - 731
  • [27] Segmentation of the mean of heteroscedastic data via cross-validation
    Sylvain Arlot
    Alain Celisse
    Statistics and Computing, 2011, 21 : 613 - 632
  • [28] Fast Cross-Validation for Kernel-Based Algorithms
    Liu, Yong
    Liao, Shizhong
    Jiang, Shali
    Ding, Lizhong
    Lin, Hailun
    Wang, Weiping
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1083 - 1096
  • [29] Regular, Median and Huber Cross-Validation: A Computational Comparison
    Yu, Chi-Wai
    Clarke, Bertrand
    STATISTICAL ANALYSIS AND DATA MINING, 2015, 8 (01) : 14 - 33
  • [30] Best subset selection via cross-validation criterion
    Takano, Yuichi
    Miyashiro, Ryuhei
    TOP, 2020, 28 (02) : 475 - 488