Distribution-balanced stratified cross-validation for accuracy estimation

被引:157
|
作者
Zeng, XC [1 ]
Martinez, TR [1 ]
机构
[1] Brigham Young Univ, Dept Comp Sci, Provo, UT 84602 USA
关键词
cross-validation; machine learning research; true accuracy; classifier;
D O I
10.1080/095281300146272
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called distribution-balanced stratified cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSCV on nine real-world and three artificial domains using the C4.5 decision trees classifier. The results show that DBSCV performs better (has smaller biases) than the regular stratified cross-validation in most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that DBSCV is particularly effective when multiple intraclass clusters exist in a data set.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [21] Minimization and estimation of the variance of prediction errors for cross-validation designs
    Fuchs, Mathias
    Krautenbacher, Norbert
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2016, 10 (02) : 420 - 443
  • [22] Cross-Validation With Confidence
    Lei, Jing
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (532) : 1978 - 1997
  • [23] A cross-validation based estimation of the proportion of true null hypotheses
    Celisse, Alain
    Robin, Stephane
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) : 3132 - 3147
  • [24] Cross-validation and aggregated EM training for robust parameter estimation
    Shinozaki, Takahiro
    Ostendorf, Mari
    COMPUTER SPEECH AND LANGUAGE, 2008, 22 (02) : 185 - 195
  • [25] Convergence rate of cross-validation in nonlinear wavelet regression estimation
    Zhang, SL
    Zheng, ZG
    CHINESE SCIENCE BULLETIN, 1999, 44 (10): : 898 - 901
  • [26] A cross-validation method for data with ties in kernel density estimation
    Kamila Żychaluk
    Prakash N. Patil
    Annals of the Institute of Statistical Mathematics, 2008, 60 : 21 - 44
  • [27] Accuracy Assessment and Cross-Validation of LPWAN Propagation Models in Urban Scenarios
    Stusek, Martin
    Moltchanov, Dmitri
    Masek, Pavel
    Mikhaylov, Konstantin
    Zeman, Otto
    Roubicek, Martin
    Koucheryavy, Yevgeni
    Hosek, Jiri
    IEEE ACCESS, 2020, 8 (08): : 154625 - 154636
  • [28] Exploring the Enumeration Accuracy of Cross-Validation Indices in Latent Class Analysis
    Whittaker, Tiffany A.
    Miller, J. E.
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2021, 28 (03) : 376 - 390
  • [29] Targeted cross-validation
    Zhang, Jiawei
    Ding, Jie
    Yang, Yuhong
    BERNOULLI, 2023, 29 (01) : 377 - 402
  • [30] Purposeful cross-validation: a novel cross-validation strategy for improved surrogate optimizability
    Correia, Daniel
    Wilke, Daniel N.
    ENGINEERING OPTIMIZATION, 2021, 53 (09) : 1558 - 1573