A taxonomy on impact of label noise and feature noise using machine learning techniques

被引:19
作者
Shanthini, A. [1 ]
Vinodhini, G. [2 ]
Chandrasekaran, R. M. [2 ]
Supraja, P. [1 ]
机构
[1] SRM Inst Sci & Technol, Dept Informat & Technol, Kattankulathur 603203, Tamil Nadu, India
[2] Annamalai Univ, Dept Comp Sci & Engn, Chidambaram, India
关键词
Feature noise; Label noise; Machine learning; Boosting; CLASSIFICATION;
D O I
10.1007/s00500-019-03968-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Soft computing techniques are effective techniques that are used in prediction of noise in the dataset which causes misclassification. In classification, it is expected to have perfect labeling, but the noise present in data has impact on the label mapped and influences the input values by affecting the input feature values of the instances. Existence of noise complicates prediction in the real-world data which leads to vicious effect of the classifier. Present study aims at quantitative assessment of label noise and feature noise through machine learning, and classification performance in medical datasets as noise handling has become an important aspect in the research work related to data mining and its application. Weak classifier boosting provides high standard accuracy levels in classification problems. This study explores the performance of most recent soft computing technique in machine learning which includes weak learner-based boosting algorithms, such as adaptive boosting, generalized tree boosting and extreme gradient boosting. Current study was made to compare and analyze disparate boosting algorithms in divergent noise and feature levels (5%, 10%, 15% and 20%) on distinct medical datasets. The performances of weak learners are measured in terms of accuracy and equalized loss of accuracy.
引用
收藏
页码:8597 / 8607
页数:11
相关论文
共 25 条
  • [1] Bagging schemes on the presence of class noise in classification
    Abellan, Joaquin
    Masegosa, Andres R.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) : 6827 - 6837
  • [2] A noise-detection based AdaBoost algorithm for mislabeled data
    Cao, Jingjing
    Kwong, Sam
    Wang, Ran
    [J]. PATTERN RECOGNITION, 2012, 45 (12) : 4451 - 4465
  • [3] Effect of errors in ground truth on classification accuracy
    Carlotto, Mark J.
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2009, 30 (18) : 4831 - 4849
  • [4] Chen T., 2015, R Package Version 0.4-2, V4, P1
  • [5] Choh Man Teng, 2001, Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference, P269
  • [6] Folleco AA, 2009, INFORM-J COMPUT INFO, V33, P245
  • [7] Classification in the Presence of Label Noise: a Survey
    Frenay, Benoit
    Verleysen, Michel
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) : 845 - 869
  • [8] Effect of label noise in the complexity of classification problems
    Garcia, Luis P. F.
    de Carvalho, Andre C. P. L. F.
    Lorena, Ana C.
    [J]. NEUROCOMPUTING, 2015, 160 : 108 - 119
  • [9] Görnitz N, 2014, JMLR WORKSH CONF PRO, V33, P293
  • [10] Karmaker A., 2006, INT J HYBRID INTELL, V3, P169, DOI DOI 10.3233/HIS-2006-3305