A Survey of mislabeled training data detection techniques for pattern classification

被引:16
作者
Guan, Donghai [1 ,2 ]
Yuan, Weiwei [1 ,3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] Harbin Engn Univ, Coll Automat, Harbin, Peoples R China
[3] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Ensemble learning; Local learning; Mislabeled data detection; NEAREST-NEIGHBOR RULE; NOISE-DETECTION; ENSEMBLE; QUALITY; ELIMINATION; MICROARRAYS;
D O I
10.4103/0256-4602.125689
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pattern classification is an important part of machine learning. To use it, a classifier is trained on the training data and then predicts the label for the future unseen data. To obtain a classifier with good performance, the quality of the training data plays an important role. Unfortunately in many areas, it is difficult to provide absolutely clean data. This paper focuses on mislabeled data, which is one of the main types of noisy data. A number of mislabeled data detection techniques have been proposed; however, there is no survey work to summarize those techniques. This paper reviews the existing studies and classifies them into three types: Local learning-based, ensemble learning-based, and single learning-based methods. The technical details, advantages, and disadvantages of these methods are discussed.
引用
收藏
页码:524 / 530
页数:7
相关论文
共 49 条
  • [41] TOMEK I, 1976, IEEE T SYST MAN CYB, V6, P448
  • [42] Knowledge discovery from imbalanced and noisy data
    Van Hulse, Jason
    Khoshgoftaar, Taghi
    [J]. DATA & KNOWLEDGE ENGINEERING, 2009, 68 (12) : 1513 - 1542
  • [43] Vázquez F, 2005, LECT NOTES COMPUT SC, V3523, P35
  • [44] Verbaeten S, 2003, LECT NOTES COMPUT SC, V2709, P317
  • [45] Wilson D Randall, 1997, Proceedings of the 14th International Conference on Machine Learning, V97, P400
  • [46] ASYMPTOTIC PROPERTIES OF NEAREST NEIGHBOR RULES USING EDITED DATA
    WILSON, DL
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1972, SMC2 (03): : 408 - &
  • [47] Wu X., 1995, KNOWLEDGE ACQUISITIO
  • [48] Zeng XC, 2003, SCIMA 2003: IEEE INTERNATIONAL WORKSHOP ON SOFT COMPUTING TECHNIQUES IN INSTRUMENTATION, MEASUREMENT AND RELATED APPLICATIONS, P26
  • [49] Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model
    Zhang, Chen
    Wu, Chunguo
    Blanzieri, Enrico
    Zhou, You
    Wang, Yan
    Du, Wei
    Liang, Yanchun
    [J]. BIOINFORMATICS, 2009, 25 (20) : 2708 - 2714