Cross-Validation With Confidence

被引:47
|
作者
Lei, Jing [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
关键词
Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;
D O I
10.1080/01621459.2019.1672556
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
引用
收藏
页码:1978 / 1997
页数:20
相关论文
共 50 条
  • [1] Multiple predicting K-fold cross-validation for model selection
    Jung, Yoonsuh
    JOURNAL OF NONPARAMETRIC STATISTICS, 2018, 30 (01) : 197 - 215
  • [2] Targeted cross-validation
    Zhang, Jiawei
    Ding, Jie
    Yang, Yuhong
    BERNOULLI, 2023, 29 (01) : 377 - 402
  • [3] On cross-validation for sparse reduced rank regression
    She, Yiyuan
    Hoang Tran
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2019, 81 (01) : 145 - 161
  • [4] The uncertainty principle of cross-validation
    Last, Mark
    2006 IEEE International Conference on Granular Computing, 2006, : 275 - 280
  • [5] Profile electoral college cross-validation
    Zhan, Zishu
    Yang, Yuhong
    INFORMATION SCIENCES, 2022, 586 : 24 - 40
  • [6] Granularity selection for cross-validation of SVM
    Liu, Yong
    Liao, Shizhong
    INFORMATION SCIENCES, 2017, 378 : 475 - 483
  • [7] Estimation Stability With Cross-Validation (ESCV)
    Lim, Chinghway
    Yu, Bin
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2016, 25 (02) : 464 - 492
  • [8] Linear model selection by cross-validation
    Rao, CR
    Wu, Y
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2005, 128 (01) : 231 - 240
  • [9] On Cross-Validation for MLP Model Evaluation
    Karkkainen, Tommi
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2014, 8621 : 291 - 300
  • [10] Fast Cross-Validation
    Liu, Yong
    Lin, Hailun
    Ding, Lizhong
    Wang, Weiping
    Liao, Shizhong
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2497 - 2503