THEORETICAL ANALYSES OF CROSS-VALIDATION ERROR AND VOTING IN INSTANCE-BASED LEARNING

被引:5
作者
TURNEY, P
机构
[1] Knowledge Systems Laboratory, Institute for Information Technology, National Research Council Canada, Ottawa, ON
关键词
CROSS-VALIDATION; SIMPLICITY; BIAS; VARIANCE; VOTING; INSTANCE-BASED LEARNING;
D O I
10.1080/09528139408953793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic. Cross-validation requires a set of training examples and a set of testing examples. The value of the attribute that is to be predicted is known to the learner in the training set, but unknown in the testing set. The theory demonstrates that cross-validation error has two components: error on the training set (inaccuracy) and sensitivity to noise (instability). This general theory is then applied to voting in instance-based learning. Given an example in the testing set, a typical instance-based learning algorithm predicts the designated attribute by voting among the k nearest neighbours (the k most similar examples) to the testing example in the training set. Voting is intended to increase the stability (resistance to noise) of instance-based learning, but a theoretical analysis shows that there are circumstances in which voting can be destabilising. The theory suggests ways to minimize cross-validation error, by insuring that voting is stable and does not adversely affect accuracy.
引用
收藏
页码:331 / 360
页数:30
相关论文
共 50 条
  • [31] ON THE BIAS AND VARIABILITY OF BOOTSTRAP AND CROSS-VALIDATION ESTIMATES OF ERROR RATE IN DISCRIMINATION PROBLEMS
    DAVISON, AC
    HALL, P
    BIOMETRIKA, 1992, 79 (02) : 279 - 284
  • [32] Evaluation of Instance-Based Learning and Q-Learning Algorithms in Dynamic Environments
    Gupta, Anmol
    Roy, Partha Pratim
    Dutt, Varun
    IEEE ACCESS, 2021, 9 : 138775 - 138790
  • [33] A Comparison of Estimators for the Variance of Cross-Validation Estimators of the Generalization Error of Computer Algorithms
    Markatou, Marianthi
    Dimova, Rositsa
    Sinha, Anshu
    NONPARAMETRIC STATISTICS AND MIXTURE MODELS: A FESTSCHRIFT IN HONOR OF THOMAS P HETTMANSPERGER, 2011, : 226 - 251
  • [34] APPLICATION OF INSTANCE-BASED LEARNING FOR CAST IRON CASTING DEFECTS PREDICTION
    Sika, Robert
    Szajewski, Damian
    Hajkowski, Jakub
    Popielarski, Pawel
    MANAGEMENT AND PRODUCTION ENGINEERING REVIEW, 2019, 10 (04) : 101 - 107
  • [35] The role of inertia in modeling decisions from experience with instance-based learning
    Dutt, Varun
    Gonzalez, Cleotilde
    FRONTIERS IN PSYCHOLOGY, 2012, 3
  • [36] Instance-Based Learning: Integrating Sampling and Repeated Decisions From Experience
    Gonzalez, Cleotilde
    Dutt, Varun
    PSYCHOLOGICAL REVIEW, 2011, 118 (04) : 523 - 551
  • [37] A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning
    Szeghalmy, Szilvia
    Fazekas, Attila
    SENSORS, 2023, 23 (04)
  • [38] Novelty Detection with Instance-based Learning For Optical Character Quality Control
    Pei, Zhijun
    Zhang, Huaxia
    Ren, Haiyan
    2008 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2008, : 2277 - +
  • [39] SpeedyIBL: A comprehensive, precise, and fast implementation of instance-based learning theory
    Nguyen, Thuy Ngoc
    Phan, Duy Nhat
    Gonzalez, Cleotilde
    BEHAVIOR RESEARCH METHODS, 2023, 55 (04) : 1734 - 1757
  • [40] SpeedyIBL: A comprehensive, precise, and fast implementation of instance-based learning theory
    Thuy Ngoc Nguyen
    Duy Nhat Phan
    Cleotilde Gonzalez
    Behavior Research Methods, 2023, 55 : 1734 - 1757