Breast Cancer Classification Based on DNA Microarray Analysis

被引:0
作者
El-Rahman, Sahar A. [1 ]
Alluhaidan, Ala Saleh D. [2 ]
Marzouk, Radwa [2 ]
机构
[1] Benha Univ, Fac Engn Shoubra, Elect Engn Dept, Cairo 13511, Egypt
[2] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, Riyadh 11671, Saudi Arabia
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Genetic sequences; big data analysis; machine learning algorithms breast cancer classification; breast cancer prediction; BIG DATA; ANALYTICS;
D O I
10.1109/ACCESS.2023.3334678
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Predicting the ability of a breast cancer patient to survive was a difficult research problem for many scholars. Since the early dates of the relevant research, significant progress has been recorded in many related areas. For example, with pioneering biomedical technologies, credits to low-cost computer hardware and software, high-quality data is gathered and stored automatically, and lastly, with better analytical methods, that massive data is processed efficiently and effectively. Therefore, the objective of this document is to submit a report on a research project in which we have benefited from the technological developments available to develop predictive models of breast cancer and whether it exists or not. Methods and materials: artificial neural network, support vector machine, decision trees, naive bayes, and random forest algorithms are used along with the most common statistical method (logistic regression) to build prediction models using a large data set. We also used the Holdout method. To avoid the unbalanced nature of the classes, the parameters of the performance evaluation are predefined. Results: The results show that the Decision Tree (DT) is the top predictor with 89.1% accuracy on the holdout sample, surpassing all prediction accuracy reported in the literature; Artificial Neural Networks (ANN) came out to be the second with 88.9% accuracy; Naive Bayes (NB) came out to be the third with 83.3% accuracy, Support Vector Machines (SVM) came out to be the fourth with 83.2% accuracy, and the Random Forest (RF) models came out to be the lowest of the five with 71.2% accuracy. Conclusion: A comparative study of multiple predictive models for breast cancer survival using a large set of data and 5-fold cross-validation gave us an insight into the relative ability to predict different data extraction methods. After analyzing the data, we have reached this conclusion: the model is able to help those who need it by predicting whether they have breast cancer or not. Furthermore, the proposed framework is valuable tool in cancer research and clinical practice.
引用
收藏
页码:138748 / 138758
页数:11
相关论文
共 50 条
  • [21] Prediction of Breast Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis
    Fatima, Noreen
    Liu, Li
    Hong, Sha
    Ahmed, Haroon
    IEEE ACCESS, 2020, 8 : 150360 - 150376
  • [22] Construction and study of breast cancer prediction model based on machine learning
    Zhang, Yichen
    PROCEEDINGS OF 2023 4TH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE FOR MEDICINE SCIENCE, ISAIMS 2023, 2023, : 523 - 530
  • [23] Kernel-based naive Bayes classifier for breast cancer prediction
    Nahar, Jesmin
    Chen, Yi-Ping Phoebe
    JOURNAL OF BIOLOGICAL SYSTEMS, 2007, 15 (01) : 17 - 25
  • [24] VELM: a voting based ensemble learning model for breast cancer prediction
    Singh, Archana
    Kaswan, Kuldeep Singh
    Rajani
    PHYSICA SCRIPTA, 2025, 100 (02)
  • [25] An Analysis of Automated Visual Analysis Classification: Interactive Visualization Task Inference of Cancer Genomics Domain Experts
    Gramazio, Connor C.
    Huang, Jeff
    Laidlaw, David H.
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (08) : 2270 - 2283
  • [26] Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI
    Islam, Taminul
    Sheakh, Md. Alif
    Tahosin, Mst. Sazia
    Hena, Most. Hasna
    Akash, Shopnil
    Bin Jardan, Yousef A.
    Fentahunwondmie, Gezahign
    Nafidi, Hiba-Allah
    Bourhia, Mohammed
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [27] Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier
    Kumar, Mukesh
    Rath, Nitish Kumar
    Rath, Santanu Kumar
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 60 : 395 - 409
  • [28] A Review on Computational Analysis of Big Data in Breast Cancer for Predicting Potential Biomarkers
    Shaikh, Nilofer
    Bapat, Sanket
    Karthikeyan, Muthukumarasamy
    Vyas, Renu
    CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2022, 22 (21) : 1793 - 1810
  • [29] An enhanced soft-computing based strategy for efficient feature selection for timely breast cancer prediction: Wisconsin Diagnostic Breast Cancer dataset case
    Singh, Law Kumar
    Khanna, Munish
    Singh, Rekha
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 76607 - 76672
  • [30] Shrink: A Breast Cancer Risk Assessment Model Based on Medical Social Network
    Li, Ali
    Wang, Rui
    Xu, Lei
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 1189 - 1196