Application of noise-reduction techniques to machine learning algorithms for breast cancer tumor identification

被引:10
作者
Ahuja, Avani [1 ]
Al-Zogbi, Lidia [2 ]
Krieger, Axel [2 ]
机构
[1] Georgetown Day High Sch Georgetown Day Sch, 4200 Davenport St NW, Washington, DC 20016 USA
[2] Johns Hopkins Univ, Dept Mech Engn, Baltimore, MD 21218 USA
关键词
Outlier removal; Noise reduction; Dimensionality reduction; Classification; Breast cancer tumor; ACCURACY; NUMBER;
D O I
10.1016/j.compbiomed.2021.104576
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The application of machine learning (ML) techniques to digitized images of biopsied cells for breast cancer diagnosis is an active area of research. We hypothesized that reducing noise in the data would lead to an increase in classification accuracies. To test this hypothesis, we first compared several classification techniques in their ability to discriminate between malignant and benign breast cancer tumors using the Wisconsin Breast Cancer Data Set and subsequently evaluated the effect of noise reduction techniques on model accuracies. We applied two noise-reduction techniques based on Principal Component Analysis - dimensionality reduction and outlier removal - to a comprehensive list of ML algorithms with different learning paradigms including Decision Trees (fine, medium, coarse), dimensionality reduction techniques (Linear Discriminant Analysis, Quadratic Discriminant Analysis, Partial Least Squares-Discriminant Analysis), logistic Regression, Bayesian techniques (Gaussian Naive, Kernel Naive), Support Vector Machines (Linear, Quadratic, Cubic, Gaussian), instance-based techniques (fine, medium, coarse, cosine, cubic, and weighted K-Nearest Neighbors), and Artificial Neural Networks. Results showed that noise removal through dimensionality reduction is most effective when using a cross-validated number of principal components, and accuracies surpassing 99% across all ML models are obtained when both noise-reduction techniques are applied sequentially. Even though such a high accuracy has been demonstrated in few instances for specific algorithms, the methodology proposed herein is the first published report demonstrating the applicability of a technique to a wide range of ML models to achieve high accuracies. We show that dimensionality reduction and outlier analysis can be used as effective approaches to improve discrimination accuracies. Also, dimensionality reduction through a cross-validated number of principal components can provide an effective framework for reducing noise in the data prior to applying a ML algorithm.
引用
收藏
页数:11
相关论文
共 47 条
[1]  
Abed BM, 2016, C IND ELECT APPL, P269, DOI 10.1109/IEACON.2016.8067390
[2]  
Ahmed M. T., 2020, J. Sci. Technol. Environ. Informat., V9, P665
[3]  
[Anonymous], 2019, JOIV INT J INF VISUA
[4]  
[Anonymous], 2003, P 1 BALK C INF THESS
[5]  
Atla Abhinav, 2011, Journal of Computing Sciences in Colleges, V26, P96, DOI DOI 10.5555/1961574.1961594
[6]   Effect of Three Decades of Screening Mammography on Breast-Cancer Incidence [J].
Bleyer, Archie ;
Welch, H. Gilbert .
NEW ENGLAND JOURNAL OF MEDICINE, 2012, 367 (21) :1998-2005
[7]   Comparison of Accuracy of Diagnostic Modalities for Evaluation of Breast Cancer With Review of Literature [J].
Bukhari, Mulazim Hussain ;
Akhtar, Zahid Mahmood .
DIAGNOSTIC CYTOPATHOLOGY, 2009, 37 (06) :416-424
[8]  
Caruana R., 2004, P 21 INT C MACHINE L, V18, DOI 10.1145/1015330.1015432
[9]   The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].
Chicco, Davide ;
Jurman, Giuseppe .
BMC GENOMICS, 2020, 21 (01)
[10]  
Choh Man Teng, 2001, Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference, P269