Cross-validation;
Model selection;
Data splitting ratio;
Reverse k-fold;
Cross-validation paradox;
MODEL SELECTION;
VARIABLE SELECTION;
REGRESSION;
VARIANCE;
D O I:
10.1016/j.ins.2021.11.017
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Cross-validation (CV), while being extensively used for model selection, may have three major weaknesses. The regular 10-fold CV, for instance, is often unstable in its choice of the best model among the candidates. Secondly, the CV outcome of singling out one candidate based on the total prediction errors over the different folds does not convey any sensible information on how much one can trust the apparent winner. Lastly, when only one data splitting ratio is considered, regardless of its choice, it may work very poorly for some situations. In this work, to address these shortcomings, we propose a new averaging-voting based version of cross-validation for better comparison results. Simulations and real data are used to illustrate the superiority of the new approach over traditional CV methods. (c) 2021 Elsevier Inc. All rights reserved.
机构:
CNRS ENS INRIA UMR 8548, Lab Informat Ecole Normale Super, Willow Project Team, F-75214 Paris 13, FranceUniv Lille 1, CNRS, Lab Math Paul Painleve, UMR 8524, F-59655 Villeneuve Dascq, France
Arlot, Sylvain
Celisse, Alain
论文数: 0引用数: 0
h-index: 0
机构:
Univ Lille 1, CNRS, Lab Math Paul Painleve, UMR 8524, F-59655 Villeneuve Dascq, FranceUniv Lille 1, CNRS, Lab Math Paul Painleve, UMR 8524, F-59655 Villeneuve Dascq, France
机构:
Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USACarnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA