A taxonomy on impact of label noise and feature noise using machine learning techniques

被引:19
作者
Shanthini, A. [1 ]
Vinodhini, G. [2 ]
Chandrasekaran, R. M. [2 ]
Supraja, P. [1 ]
机构
[1] SRM Inst Sci & Technol, Dept Informat & Technol, Kattankulathur 603203, Tamil Nadu, India
[2] Annamalai Univ, Dept Comp Sci & Engn, Chidambaram, India
关键词
Feature noise; Label noise; Machine learning; Boosting; CLASSIFICATION;
D O I
10.1007/s00500-019-03968-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Soft computing techniques are effective techniques that are used in prediction of noise in the dataset which causes misclassification. In classification, it is expected to have perfect labeling, but the noise present in data has impact on the label mapped and influences the input values by affecting the input feature values of the instances. Existence of noise complicates prediction in the real-world data which leads to vicious effect of the classifier. Present study aims at quantitative assessment of label noise and feature noise through machine learning, and classification performance in medical datasets as noise handling has become an important aspect in the research work related to data mining and its application. Weak classifier boosting provides high standard accuracy levels in classification problems. This study explores the performance of most recent soft computing technique in machine learning which includes weak learner-based boosting algorithms, such as adaptive boosting, generalized tree boosting and extreme gradient boosting. Current study was made to compare and analyze disparate boosting algorithms in divergent noise and feature levels (5%, 10%, 15% and 20%) on distinct medical datasets. The performances of weak learners are measured in terms of accuracy and equalized loss of accuracy.
引用
收藏
页码:8597 / 8607
页数:11
相关论文
共 25 条
  • [11] Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (03): : 552 - 568
  • [12] A Data Activity-Based Server-Side Cache Replacement for Mobile Devices
    Kottursamy, Kottilingam
    Raja, Gunasekaran
    Saranya, K.
    [J]. ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2015, 2016, 394 : 579 - 589
  • [13] Classification in the presence of class noise using a probabilistic Kernel Fisher method
    Li, Yunlei
    Wessels, Lodewyk F. A.
    de Ridder, Dick
    Reinders, Marcel J. T.
    [J]. PATTERN RECOGNITION, 2007, 40 (12) : 3349 - 3357
  • [14] Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data
    Mantas, Carlos J.
    Abellan, Joaquin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (05) : 2514 - 2525
  • [15] Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin
    Mellor, Andrew
    Boukir, Samia
    Haywood, Andrew
    Jones, Simon
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2015, 105 : 155 - 168
  • [16] Natarajan Nagarajan, 2013, Advances in Neural Information Processing Systems
  • [17] Class noise and supervised learning in medical domains: The effect of feature extraction
    Pechenizkiy, Mykola
    Tsymbal, Alexey
    Puuronen, Seppo
    Pechenizkiy, Oleksandr
    [J]. 19TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2006, : 708 - +
  • [18] Vote-boosting ensembles
    Sabzevari, Maryam
    Martinez-Munoz, Gonzalo
    Suarez, Alberto
    [J]. PATTERN RECOGNITION, 2018, 83 : 119 - 133
  • [19] Evaluating the classifier behavior with noisy data considering performance and robustness: The Equalized Loss of Accuracy measure
    Saez, Jose A.
    Luengo, Julian
    Herrera, Francisco
    [J]. NEUROCOMPUTING, 2016, 176 : 26 - 35
  • [20] An empirical study of the classification performance of learners on imbalanted and noisy software quality data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Folleco, Andres
    [J]. INFORMATION SCIENCES, 2014, 259 : 571 - 595