A Belief Theory Based Instance Selection Scheme for Label Noise and Outlier Detection from Breast Cancer Data

被引:0
作者
Faziludeen, Shameer [1 ]
Sankaran, Praveen [1 ]
机构
[1] Natl Inst Technol Calicut, Kozhikode, Kerala, India
来源
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT II | 2024年 / 2010卷
关键词
Support vector machine; belief theory; Dempster Shafer theory; evidential k nearest neighbours; k nearest neighbours; breast cancer; histopathology; FNAC image; CLASSIFICATION;
D O I
10.1007/978-3-031-58174-8_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In case of real datasets, the likelihood of the training data being corrupted with training label noise and outliers arises. Certain classification algorithms including support vector machine (SVM) is sensitive to noise and outlier samples which can degrade their performance. Belief theory which involves an extension of the general probabilistic model and utilises combination rules for information fusion has found good use in the realm of classifiers. In this paper, we propose a belief theory based instance selection (BIS) scheme using the k nearest neighbours (KNN) algorithm for removing outlier and noise samples prior to SVM training to increase classification performance for breast cancer FNAC (Fine needle aspiration cytology) image data features. Our algorithm is tested on the WBCD database from the UCI machine learning repository which contains FNAC image data features. Performance evaluation is done by considering accuracy and confusion matrix measures. Effect of noise is assessed by testing on the datasets after contaminating the training data by random mislabelling. Results are compared with the conventional SVM algorithm for both the noisy and noiseless datasets. The proposed BIS scheme is shown to improve the performance of the SVM classifier considerably under noisy conditions.
引用
收藏
页码:67 / 77
页数:11
相关论文
共 19 条
[1]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[2]   UPPER AND LOWER PROBABILITIES INDUCED BY A MULTIVALUED MAPPING [J].
DEMPSTER, AP .
ANNALS OF MATHEMATICAL STATISTICS, 1967, 38 (02) :325-&
[3]   A K-NEAREST NEIGHBOR CLASSIFICATION RULE-BASED ON DEMPSTER-SHAFER THEORY [J].
DENOEUX, T .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1995, 25 (05) :804-813
[4]  
Dheeru Dua and Casey Graff, 2017, UCI machine learning repository
[5]  
Gao BB, 2019, Arxiv, DOI [arXiv:1711.05406, 10.48550/arXiv.1711.05406, DOI 10.48550/ARXIV.1711.05406]
[6]  
Jia HJ, 2009, PROC CVPR IEEE, P136, DOI 10.1109/CVPRW.2009.5206862
[7]  
Labatut V., 2012, arXiv, DOI [10.48550/arXiv.1207.3790, DOI 10.48550/ARXIV.1207.3790]
[8]   A trainable feature extractor for handwritten digit recognition [J].
Lauer, Fabien ;
Suen, Ching Y. ;
Bloch, Gerard .
PATTERN RECOGNITION, 2007, 40 (06) :1816-1824
[9]   Fuzzy support vector machines [J].
Lin, CF ;
Wang, SD .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (02) :464-471
[10]  
Liu WB, 2017, 2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), P106