On the Accuracy of Cross-Validation in the Classification Problem

被引:0
作者
Nedel'ko, V. M. [1 ]
机构
[1] Sobolev Inst Math SB RAS, Novosibirsk, Russia
来源
BULLETIN OF IRKUTSK STATE UNIVERSITY-SERIES MATHEMATICS | 2021年 / 38卷
关键词
K-fold cross-validation; accuracy; statistical estimates; machinelearning; STATISTICAL VIEW; PROBABILITY;
D O I
10.26516/1997-7670.2021.38.84
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this work we will study the accuracy of the cross-validation estimates for decision functions. The main idea of the research consists in the scheme of statistical modeling that allows using real data to obtain statistical estimates, which are usually obtained only by using model (synthetic) distributions. The studies confirm the well-known empirical recommendation to choose the number of folds equal to 5 or more. The choice of more than 10 folds does not yield a significant increase in accuracy. The use of repeated cross-validation also does not provide fundamental gain in precision. The results of the experiments allow us to formulate an empirical fact that the accuracy of the estimates obtained by the cross-validation method is approximately the same as the accuracy of the estimates obtained from the test sample of half the size. This result can be easily explained by the fact that all the objects of the test sample are independent, and the estimates built by the cross-validation on different subsamples (folds) are not independent.
引用
收藏
页码:84 / 95
页数:12
相关论文
共 19 条
[1]  
Bayle P., 2020, Advances in Neural Information Processing Systems, V33, P16339
[2]   Variance reduction in estimating classification error using sparse datasets [J].
Beleites, C ;
Baumgartner, R ;
Bowman, C ;
Somorjai, R ;
Steiner, G ;
Salzer, R ;
Sowa, MG .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 79 (1-2) :91-100
[3]  
Franc V., 2011, P 28 INT C MACHINE L, P665
[4]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[5]  
Kelmanov A.V., 2015, DOKL AKAD NAUK+, V464, P535, DOI [10.7868/S0044466916030091, DOI 10.7868/S0044466916030091]
[6]  
Krasotkina OV, 2011, LECT NOTES COMPUT SC, V6744, P24, DOI 10.1007/978-3-642-21786-9_6
[7]  
Krasotkina O.V., 2013, P TULA STATE U ENG, P177
[8]  
Lbov G. S., 1999, LOGICHESKIE RESHAJUS
[9]  
Lbov G. S., 1990, ANALIZ DANNYH ZNANIJ, P56
[10]  
Lugosi G, 2004, ANN STAT, V32, P30