A Survey of mislabeled training data detection techniques for pattern classification

被引:16
作者
Guan, Donghai [1 ,2 ]
Yuan, Weiwei [1 ,3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] Harbin Engn Univ, Coll Automat, Harbin, Peoples R China
[3] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Ensemble learning; Local learning; Mislabeled data detection; NEAREST-NEIGHBOR RULE; NOISE-DETECTION; ENSEMBLE; QUALITY; ELIMINATION; MICROARRAYS;
D O I
10.4103/0256-4602.125689
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pattern classification is an important part of machine learning. To use it, a classifier is trained on the training data and then predicts the label for the future unseen data. To obtain a classifier with good performance, the quality of the training data plays an important role. Unfortunately in many areas, it is difficult to provide absolutely clean data. This paper focuses on mislabeled data, which is one of the main types of noisy data. A number of mislabeled data detection techniques have been proposed; however, there is no survey work to summarize those techniques. This paper reviews the existing studies and classifies them into three types: Local learning-based, ensemble learning-based, and single learning-based methods. The technical details, advantages, and disadvantages of these methods are discussed.
引用
收藏
页码:524 / 530
页数:7
相关论文
共 49 条
  • [1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
  • [2] TOLERATING NOISY, IRRELEVANT AND NOVEL ATTRIBUTES IN INSTANCE-BASED LEARNING ALGORITHMS
    AHA, DW
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1992, 36 (02): : 267 - 287
  • [3] [Anonymous], 2003, P 20 INT C MACH LEAR
  • [4] [Anonymous], P 10 INT C MACHINE L
  • [5] [Anonymous], 1993, C4 5 PROGRAMS MACHIN
  • [6] Berthelsen H, 2000, LECT NOTES ARTIF INT, V1902, P27
  • [7] Bootkrajang J., 2011, 19 EUR S ART NEUR NE
  • [8] Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
  • [9] Identifying mislabeled training data
    Brodley, CE
    Friedl, MA
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 131 - 167
  • [10] Brodley CE, 1996, INT GEOSCI REMOTE SE, P1379, DOI 10.1109/IGARSS.1996.516669