Distribution-balanced stratified cross-validation for accuracy estimation

被引:157
|
作者
Zeng, XC [1 ]
Martinez, TR [1 ]
机构
[1] Brigham Young Univ, Dept Comp Sci, Provo, UT 84602 USA
关键词
cross-validation; machine learning research; true accuracy; classifier;
D O I
10.1080/095281300146272
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called distribution-balanced stratified cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSCV on nine real-world and three artificial domains using the C4.5 decision trees classifier. The results show that DBSCV performs better (has smaller biases) than the regular stratified cross-validation in most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that DBSCV is particularly effective when multiple intraclass clusters exist in a data set.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [41] Experience with a cross-validation approach
    D. Gansser
    Chromatographia, 2002, 55 : S71 - S74
  • [42] The uncertainty principle of cross-validation
    Last, Mark
    2006 IEEE International Conference on Granular Computing, 2006, : 275 - 280
  • [43] Cross-validation and median criterion
    Zheng, ZG
    Yang, Y
    STATISTICA SINICA, 1998, 8 (03) : 907 - 921
  • [44] Experience with a cross-validation approach
    Gansser, D
    CHROMATOGRAPHIA, 2002, 55 (Suppl 1) : S71 - S74
  • [45] A THEORY OF CROSS-VALIDATION ERROR
    TURNEY, P
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 1994, 6 (04) : 361 - 391
  • [46] CROSS-VALIDATION IN STEPWISE REGRESSION
    SALAHUDDIN
    HAWKES, AG
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1991, 20 (04) : 1163 - 1182
  • [47] On cross-validation of Bayesian models
    Alqallaf, F
    Gustafson, P
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2001, 29 (02): : 333 - 340
  • [48] A cross-validation framework to find a better state than the balanced one for oversampling in imbalanced classification
    Qizhu Dai
    Donggen Li
    Shuyin Xia
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2877 - 2886
  • [49] A cross-validation framework to find a better state than the balanced one for oversampling in imbalanced classification
    Dai, Qizhu
    Li, Donggen
    Xia, Shuyin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (08) : 2877 - 2886
  • [50] ANALYSIS AND ESTIMATION OF THE VARIANCE OF CROSS-VALIDATION ESTIMATORS OF THE GENERALIZATION ERROR: A SHORT REVIEW
    Markatou, Marianthi
    Dimova, Rositsa
    Sinha, Anshu
    FRONTIERS OF APPLIED AND COMPUTATIONAL MATHEMATICS, 2008, : 206 - +