Selecting Features for Breast Cancer Analysis and Prediction

被引:7
作者
Ray, Sujan [1 ]
AlGhamdi, Ali [1 ,2 ]
Alshouiliy, Khaldoon [1 ]
Agrawal, Dharma P. [1 ]
机构
[1] Univ Cincinnati, EECS, Ctr Distributed & Mobile Comp, Cincinnati, OH 45221 USA
[2] AlBaha Univ, Dept Comp Sci & Informat Technol, Al Agig, Saudi Arabia
来源
PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATION ENGINEERING (ICACCE-2020) | 2020年
关键词
Apache Spark; Big Data; Breast Cancer; Data Pre-processing; Decision Tree; Healthcare; Machine Learning; Normalization; PCA; Random Forest; Wisconsin Diagnosis Breast Cancer Dataset;
D O I
10.1109/icacce49060.2020.9154919
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Breast Cancer (BC) is the second most common cancer in women after skin cancer and has become a major health issue. As a result, it is very important to diagnose BC correctly and categorizing the tumors into malignant or benign groups. We know that Machine Learning (ML) techniques have unique advantages and that is why they are widely used to analyze complex BC dataset and predict the disease. Wisconsin Diagnosis Breast Cancer (WDBC) dataset has been used to develop predictive models for BC by researchers in this field. The dataset has 573 instances and 32 features. In this paper, we have proposed a method for analyzing and predicting BC on the same dataset using Apache Spark. This big data framework is a very powerful tool for working on huge volume of data, such as healthcare data [4]. Principle Component Analysis (PCA) has been applied on the dataset for selecting the most important features. We have run experiments with top 6 and 10 features. The experiments are executed on Hadoop cluster, a cloud platform provided by the Electrical Engineering and Computer Science (EECS) department of University of Cincinnati. We have also made a comparison between the performance of different machine learning techniques: Decision Tree and Random Forest Classifier. We have set the performance of Decision Tree with top 10 features as a benchmark in our work. Random forest Classifier performs better than Decision Tree algorithm with top 6 as well as top 10 features. Random Forest achieves 97.52% accuracy using top 10 features. Our results show that selecting the right features significantly improves accuracy in predicting BC.
引用
收藏
页数:6
相关论文
共 50 条
[41]   Development and validation of a prediction model for the diagnosis of breast cancer based on clinical and ultrasonic features [J].
He, Xuan ;
Lu, Yuanyuan ;
Li, Junlai .
GLAND SURGERY, 2023, 12 (06) :736-+
[42]   The Role of Linear Discriminant Analysis for Accurate Prediction of Breast Cancer [J].
Jessica, Egwom Onyinyechi ;
Hamada, Mohamed ;
Yusuf, Saratu Ilu ;
Hassan, Mohammed .
2021 IEEE 14TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2021), 2021, :340-344
[43]   Prediction of Histological Grade in Breast Cancer by Combining DCE-MRI and DWI Features [J].
Zhao, Wenrui ;
Fan, Ming ;
Xu, Maosheng ;
Li, Lihua .
MEDICAL IMAGING 2019: IMAGING INFORMATICS FOR HEALTHCARE, RESEARCH, AND APPLICATIONS, 2019, 10954
[44]   Prediction models of breast cancer molecular subtypes based on multimodal ultrasound and clinical features [J].
Li, Hui ;
Zhang, Chang-tao ;
Shao, Hua-guo ;
Pan, Lin ;
Li, Zhongyun ;
Wang, Min ;
Xu, Shi-hao .
BMC CANCER, 2025, 25 (01)
[45]   DCE-MRI Texture Features for Early Prediction of Breast Cancer Therapy Response [J].
Thibault, Guillaume ;
Tudorica, Alina ;
Afzal, Aneela ;
Chui, Stephen Y-C ;
Naik, Arpana ;
Troxell, Megan L. ;
Kemmer, Kathleen A. ;
Oh, Karen Y. ;
Roy, Nicole ;
Jafarian, Neda ;
Holtorf, Megan L. ;
Huang, Wei ;
Song, Xubo .
TOMOGRAPHY, 2017, 3 (01) :23-32
[46]   Enhancing Pathological Complete Response Prediction in Breast Cancer: The Added Value of Pretherapeutic Contrast-Enhanced Cone Beam Breast CT Semantic Features [J].
Wang, Yafei ;
Ma, Yue ;
Wang, Fang ;
Liu, Aidi ;
Zhao, Mengran ;
Bian, Keyi ;
Zhu, Yueqiang ;
Yin, Lu ;
Ye, Zhaoxiang .
ACADEMIC RADIOLOGY, 2025, 32 (06) :3191-3205
[47]   Features of aggressive breast cancer [J].
Arpino, Grazia ;
Milano, Monica ;
De Placido, Sabino .
BREAST, 2015, 24 (05) :594-600
[48]   Prediction of molecular subtypes of breast cancer using BI-RADS features based on a "white box" machine learning approach in a multi-modal imaging setting [J].
Wu, Mingxiang ;
Zhong, Xiaoling ;
Peng, Quanzhou ;
Xu, Mei ;
Huang, Shelei ;
Yuan, Jialin ;
Ma, Jie ;
Tan, Tao .
EUROPEAN JOURNAL OF RADIOLOGY, 2019, 114 :175-184
[49]   Artificial intelligence in breast cancer survival prediction: a comprehensive systematic review and meta-analysis [J].
Javanmard, Zohreh ;
Shahraki, Saba Zarean ;
Safari, Kosar ;
Omidi, Abbas ;
Raoufi, Sadaf ;
Rajabi, Mahsa ;
Akbari, Mohammad Esmaeil ;
Aria, Mehrad .
FRONTIERS IN ONCOLOGY, 2025, 14
[50]   Advancements in Breast Cancer Detection using Machine Learning Techniques for Early and Accurate Prediction [J].
Kumar, S. Senthil ;
Kathiresan, V ;
Gomathi, R. ;
Karthik, R. ;
Priya, S. Sugantha ;
Jasmine, C. Naveena .
2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, :730-737