Nested cross-validation when selecting classifiers is overzealous for most practical applications

被引:153
作者
Wainer, Jacques [1 ]
Cawley, Gavin [2 ]
机构
[1] Univ Estadual Campinas, Comp Inst, BR-13083852 Campinas, SP, Brazil
[2] Univ East Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
关键词
Hyperparameters; Classification; cross-validation; Nested cross-validation; Model selection;
D O I
10.1016/j.eswa.2021.115222
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset and the best set of hyperparameters for the chosen model. The usual approach is to apply a nested cross-validation procedure: hyperparameter selection is performed in the inner cross-validation, while the outer cross-validation computes an unbiased estimate of the expected accuracy of the algorithm with cross-validation based hyperparameter tuning. The alternative approach, which we shall call "flat cross-validation", uses a single cross-validation step both to select the optimal hyperparameter values and to provide an estimate of the expected accuracy of the algorithm that, while biased, may nevertheless still be used to select the best learning algorithm. We tested both procedures using 12 different algorithms on 115 real-life binary datasets and conclude that using the less computationally costly flat cross-validation procedure will generally result in the selection of an algorithm that is, for all practical purposes, of similar quality to that selected via nested cross-validation, provided the learning algorithms have relatively few hyperparameters to be optimised.
引用
收藏
页数:10
相关论文
共 32 条
[1]   FEATURE SELECTION IN OMICS PREDICTION PROBLEMS USING CAT SCORES AND FALSE NONDISCOVERY RATE CONTROL [J].
Ahdesmaeki, Miika ;
Strimmer, Korbinian .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :503-519
[2]  
[Anonymous], 2016, ARXIV160600930
[3]  
Bagnall A., 2018, IS ROTATION FOREST B
[4]  
Benavoli A, 2017, J MACH LEARN RES, V18
[5]  
Bishop C.M., 1995, Neural networks for pattern recognition
[6]   Classification Based on Multivariate Contrast Patterns [J].
Canete-Sifuentes, Leonardo ;
Monroy, Raul ;
Angel Medina-Perez, Miguel ;
Loyola-Gonzalez, Octavio ;
Vera Voronisky, Francisco .
IEEE ACCESS, 2019, 7 :55744-55762
[7]  
Cawley GC, 2010, J MACH LEARN RES, V11, P2079
[8]  
Chen T, 2016, P 22 ACM SIGKDD INT, P785
[9]  
Cohen W. W., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P115
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411