Feature Selection and Comparative Analysis of Breast Cancer Prediction Using Clinical Data and Histopathological Whole Slide Images

被引:0
作者
Mohammed, Sarfaraz Ahmed [1 ]
Abeysinghe, Senuka [2 ]
Ralescu, Anca [1 ]
机构
[1] Univ Cincinnati, Dept Comp Sci, Cincinnati, OH 45221 USA
[2] Indian Hill High Sch, Ohios Coll, Credit Plus Program, Cincinnati, OH 45243 USA
来源
ADVANCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING | 2023年 / 3卷 / 03期
关键词
Breast cancer; Machine learning; Principal component analysis; Particle swarm optimization; Feature selection; Logistic regression; Na & iuml; ve bayes classification; k-NN; Support vector machines; Random forest; K-Means; Whole slide images; TCGA; Histopathology; Deep learning; Digital image analysis; Convolutional neural network; H&E-stained images; Nuclei segmentation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast Carcinoma is a common cancer among women, with invasive ductal carcinoma and lobular carcinoma being the two most frequent types. Early detection is critical to prevent cancer from becoming malignant. Diagnostic tests include mammogram, ultrasound, MRI, or biopsy. Machine Learning algorithms can play a key role in analyzing complex clinical datasets to predict disease outcomes. This study uses machine learning and deep learning techniques to analyze publicly available clinical and medical image data. For clinical data, Principal Component Analysis (PCA) and Particle Swarm Optimization (PSO) are applied on the Wisconsin Breast Cancer dataset (WDBC) for feature selection and evaluate the performance of each modality in distinguishing between benign and malignant tumors. The results obtained show that the Random Forest (RF) classifier outperforms other classification algorithms using both PSO and PCA feature selections, achieving predictive accuracies of 95.7% and 97.2% respectively. The first part of the paper contains a comprehensive analysis of the two feature selection methods on clinical data to optimize predictive performance. The second part of the paper is concerned with image data. Although Histopathological Whole Slide Imaging (WSI) has been validated for a variety of pathological applications for over two decades of manual detection of cancerous tumors, it remains challenging and prone to human error. With the potential of deep learning models to aid pathologists in detecting cancer subtypes, and the increasing predictive ability of current image analysis techniques in identifying the underlying genomic data and cancer-causing mutations, the second half of the paper focusses on feature extraction using a deep convolutional neural network (U-Net) trained on WSI's from The Cancer Genome Atlas (TCGA) to accurately classify and extract relevant features. The focus is on feature extraction, nuclei-based instance segmentation, H&E-stained image extraction, and quantifying intensity information for a given WSI to classify the disease type. A comprehensive analysis of feature selection methods is presented for both clinical and medical image data.
引用
收藏
页码:1494 / 1525
页数:32
相关论文
共 54 条
[1]   On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset [J].
Agarap, Abien Fred M. .
2ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING (ICMLSC 2018), 2015, :5-9
[2]  
Agustian F, 2020, 8 INT C CYB IT SERV, P1
[3]  
[Anonymous], 2008, US
[4]  
[Anonymous], About us
[5]  
[Anonymous], About us
[6]  
[Anonymous], About Us
[7]  
Aruna S., 2011, Comput. Sci. Inf. Technol., P37, DOI DOI 10.5121/CSIT.2011.1205
[8]   Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis [J].
Asri, Hiba ;
Mousannif, Hajar ;
Al Moatassime, Hassan ;
Noel, Thomas .
7TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2016) / THE 6TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2016) / AFFILIATED WORKSHOPS, 2016, 83 :1064-1069
[9]   An approach to feature selection for keystroke dynamics systems based on PSO and feature weighting [J].
Azevedo, Gabriel L. F. B. G. ;
Cavalcanti, George D. C. ;
Carvalho Filho, E. C. B. .
2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, :3577-3584
[10]   Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology [J].
Bera, Kaustav ;
Schalper, Kurt A. ;
Rimm, David L. ;
Velcheti, Vamsidhar ;
Madabhushi, Anant .
NATURE REVIEWS CLINICAL ONCOLOGY, 2019, 16 (11) :703-715