BCD-WERT: a novel approach for breast cancer detection using whale optimization based efficient features and extremely randomized tree algorithm

被引:73
作者
Abbas, Shafaq [1 ]
Jalil, Zunera [2 ]
Javed, Abdul Rehman [2 ]
Batool, Iqra [1 ]
Khan, Mohammad Zubair [3 ]
Noorwali, Abdulfattah [4 ]
Gadekallu, Thippa Reddy [5 ]
Akbar, Aqsa [1 ]
机构
[1] Air Univ, Dept Comp Sci, Islamabad, Pakistan
[2] Air Univ, Dept Cyber Secur, Islamabad, Pakistan
[3] Taibah Univ, Coll Comp Sci & Engn, Dept Comp Sci, Madinah, Saudi Arabia
[4] Umm Al Qura Univ, Elect Engn Dept, Mecca, Saudi Arabia
[5] Vellore Inst Technol Univ, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
关键词
Breast cancer; Machine learning; Whale optimization algorithm; Support vector machine; PARTICLE SWARM OPTIMIZATION; MACHINE-LEARNING ALGORITHMS; FEATURE-SELECTION; K-MEANS; PREDICTION;
D O I
10.7717/peerj-cs.390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast cancer is one of the leading causes of death in the current age. It often results in subpar living conditions for a patient as they have to go through expensive and painful treatments to fight this cancer. One in eight women all over the world is affected by this disease. Almost half a million women annually do not survive this fight and die from this disease. Machine learning algorithms have proven to outperform all existing solutions for the prediction of breast cancer using models built on the previously available data. In this paper, a novel approach named BCD-WERT is proposed that utilizes the Extremely Randomized Tree and Whale Optimization Algorithm (WOA) for efficient feature selection and classification. WOA reduces the dimensionality of the dataset and extracts the relevant features for accurate classification. Experimental results on state-of-the-art comprehensive dataset demonstrated improved performance in comparison with eight other machine learning algorithms: Support Vector Machine (SVM), Random Forest, Kernel Support Vector Machine, Decision Tree, Logistic Regression, Stochastic Gradient Descent, Gaussian Naive Bayes and k-Nearest Neighbor. BCD-WERT outperformed all with the highest accuracy rate of 99.30% followed by SVM achieving 98.60% accuracy. Experimental results also reveal the effectiveness of feature selection techniques in improving prediction accuracy.
引用
收藏
页码:1 / 20
页数:20
相关论文
共 49 条
[1]   A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and Machine Learning Applications [J].
Ak, Muhammet Fatih .
HEALTHCARE, 2020, 8 (02)
[2]  
Akar O., 2012, J GEOD GEOINFORM, V1, P105, DOI [DOI 10.9733/JGG.241212.1, 10.9733/jgg.241212.1]
[3]   Spam profiles detection on social networks using computational intelligence methods: The effect of the lingual context [J].
Al-Zoubi, Ala' M. ;
Faris, Hossam ;
Alqatawna, Ja'far ;
Hassonah, Mohammad A. .
JOURNAL OF INFORMATION SCIENCE, 2021, 47 (01) :58-81
[4]   Evolving Support Vector Machines using Whale Optimization Algorithm for spam profiles detection on online social networks in different lingual contexts [J].
Al-Zoubi, Ala' M. ;
Faris, Hossam ;
Alqatawna, Ja'far ;
Hassonah, Mohammad A. .
KNOWLEDGE-BASED SYSTEMS, 2018, 153 :91-104
[5]   On the Scalability of Machine-Learning Algorithms for Breast Cancer Prediction in Big Data Context [J].
Alghunaim, Sara ;
Al-Baity, Heyam H. .
IEEE ACCESS, 2019, 7 :91535-91546
[6]   Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis [J].
Asri, Hiba ;
Mousannif, Hajar ;
Al Moatassime, Hassan ;
Noel, Thomas .
7TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2016) / THE 6TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2016) / AFFILIATED WORKSHOPS, 2016, 83 :1064-1069
[7]   Telemedicine Supported Chronic Wound Tissue Prediction Using Classification Approaches [J].
Chakraborty, Chinmay ;
Gupta, Bharat ;
Ghosh, Soumya K. ;
Das, Dev K. ;
Chakraborty, Chandan .
JOURNAL OF MEDICAL SYSTEMS, 2016, 40 (03) :1-12
[8]  
Chakraborty Chinmay., 2015, International Journal of Rough Sets and Data Analysis, V2, P58, DOI DOI 10.4018/IJRSDA.2015070104
[9]  
Chaurasia V, 2017, INT J INNOVATIVE RES, V3297, P2320
[10]   Medical data set classification using a new feature selection algorithm combined with twin-bounded support vector machine [J].
de Lima, Marcio Dias ;
Roque e Lima, Juliana de Oliveira ;
Barbosa, Rommel M. .
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (03) :519-528