Distribution-balanced stratified cross-validation for accuracy estimation

被引:157
|
作者
Zeng, XC [1 ]
Martinez, TR [1 ]
机构
[1] Brigham Young Univ, Dept Comp Sci, Provo, UT 84602 USA
关键词
cross-validation; machine learning research; true accuracy; classifier;
D O I
10.1080/095281300146272
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called distribution-balanced stratified cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSCV on nine real-world and three artificial domains using the C4.5 decision trees classifier. The results show that DBSCV performs better (has smaller biases) than the regular stratified cross-validation in most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that DBSCV is particularly effective when multiple intraclass clusters exist in a data set.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [31] Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction
    Gianola, Daniel
    Schoen, Chris-Carolin
    G3-GENES GENOMES GENETICS, 2016, 6 (10): : 3107 - 3128
  • [32] Validation and Cross-Validation Methods for ASCAT
    Anderson, Craig
    Figa-Saldana, Julia
    Wilson, John Julian William
    Ticconi, Francesca
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (05) : 2232 - 2239
  • [33] Cross-validation and permutations in MVPA: Validity of permutation strategies and power of cross-validation schemes
    Valente, Giancarlo
    Castellanos, Agustin Lage
    Hausfeld, Lars
    De Martino, Federico
    Formisano, Elia
    NEUROIMAGE, 2021, 238
  • [34] Penalized likelihood density estimation: Direct cross-validation and scalable approximation
    Gu, C
    Wang, JY
    STATISTICA SINICA, 2003, 13 (03) : 811 - 826
  • [35] VARIANCE ESTIMATION OF A GENERAL U-STATISTIC WITH APPLICATION TO CROSS-VALIDATION
    Wang, Qing
    Lindsay, Bruce
    STATISTICA SINICA, 2014, 24 (03) : 1117 - 1141
  • [36] Adaptive kernel density estimation with generalized least square cross-validation
    Demir, Serdar
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2019, 48 (02): : 616 - 625
  • [37] OPTIMAL CROSS-VALIDATION IN DENSITY ESTIMATION WITH THE L2-LOSS
    Celisse, Alain
    ANNALS OF STATISTICS, 2014, 42 (05) : 1879 - 1910
  • [38] ON THE LEAST-SQUARES CROSS-VALIDATION BANDWIDTH IN HAZARD RATE ESTIMATION
    PATIL, PN
    ANNALS OF STATISTICS, 1993, 21 (04) : 1792 - 1810
  • [39] On the marginal likelihood and cross-validation
    Fong, E.
    Holmes, C. C.
    BIOMETRIKA, 2020, 107 (02) : 489 - 496
  • [40] Cross-validation on extreme regions
    Aghbalou, Anass
    Bertail, Patrice
    Portier, Francois
    Sabourin, Anne
    EXTREMES, 2024, 27 (04) : 505 - 555