Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

被引:0
|
作者
Rajendran, Keerthana [1 ]
Jayabalan, Manoj [1 ,2 ]
Thiruchelvam, Vinesh [1 ]
机构
[1] Asia Pacific Univ Technol & Innovat, Sch Comp, Kuala Lumpur, Malaysia
[2] Liverpool John Moores Univ, Fac Engn & Technol, Liverpool, Merseyside, England
关键词
Breast cancer; class imbalance; diagnosis; bayesian network; DIAGNOSIS; MODEL; RISK; AGE;
D O I
10.14569/IJACSA.2020.0110808
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A widespread global health concern among women is the incidence of the second most leading cause of fatality which is breast cancer. Predicting the occurrence of breast cancer based on the risk factors will pave the way to an early diagnosis and an efficient treatment in a quicker time. Although there are many predictive models developed for breast cancer in the past, most of these models are generated from highly imbalanced data. The imbalanced data is usually biased towards the majority class but in cancer diagnosis, it is crucial to diagnose the patients with cancer correctly which are oftentimes the minority class. This study attempts to apply three different class balancing techniques namely oversampling (Synthetic Minority Oversampling Technique (SMOTE)), undersampling (SpreadSubsample) and a hybrid method (SMOTE and SpreadSubsample) on the Breast Cancer Surveillance Consortium (BCSC) dataset before constructing the supervised learning methods. The algorithms employed in this study include Naive Bayes, Bayesian Network, Random Forest and Decision Tree (C4.5). The balancing method which yields the best performance across all the four classifiers were tested using the validation data to determine the final predictive model. The performances of the classifiers were evaluated using a Receiver Operating Characteristic (ROC) curve, sensitivity, and specificity.
引用
收藏
页码:54 / 63
页数:10
相关论文
共 50 条
  • [1] Predicting Breast Cancer Leveraging Supervised Machine Learning Techniques
    Aamir, Sanam
    Rahim, Aqsa
    Aamir, Zain
    Abbasi, Saadullah Farooq
    Khan, Muhammad Shahbaz
    Alhaisoni, Majed
    Khan, Muhammad Attique
    Khan, Khyber
    Ahmad, Jawad
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [2] Online Automated Machine Learning for Class Imbalanced Data Streams
    Wang, Zhaoyang
    Wang, Shuo
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [3] Predicting hospital associated disability from imbalanced data using supervised learning
    Saarela, Mirka
    Ryynanen, Olli-Pekka
    Ayramo, Sami
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 95 : 88 - 95
  • [4] Predicting cancer using supervised machine learning: Mesothelioma
    Choudhury, Avishek
    TECHNOLOGY AND HEALTH CARE, 2021, 29 (01) : 45 - 58
  • [5] PREDICTING THE PROBABILITY OF OUTCOME IN BREAST CANCER - A COMPARISON OF DIFFERENT MACHINE LEARNING METHODS
    Al-allak, A.
    Leonard, R.
    Lewis, P.
    EJC SUPPLEMENTS, 2010, 8 (06): : 26 - 26
  • [6] Predicting survival of pancreatic cancer using supervised machine learning
    Osman, M. H.
    ANNALS OF ONCOLOGY, 2018, 29
  • [7] Machine learning in breast cancer imaging: a review on data, models and methods
    Grinet, Macro S. V. M.
    Gouveia, Ana I. R.
    Gomes, Abel J. P.
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2024, 11 (07):
  • [8] A Supervised Learning Approach for Imbalanced Data Sets
    Nguyen, Giang H.
    Bouzerdoum, Abdesselam
    Phung, Son L.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3759 - 3762
  • [9] Learning from class-imbalanced data: Review of methods and applications
    Guo Haixiang
    Li Yijing
    Shang, Jennifer
    Gu Mingyun
    Huang Yuanyue
    Bing, Gong
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 : 220 - 239
  • [10] Oversampling Methods for Classification of Imbalanced Breast Cancer Malignancy Data
    Krawczyk, Bartosz
    Jelen, Lukasz
    Krzyzak, Adam
    Fevens, Thomas
    COMPUTER VISION AND GRAPHICS, 2012, 7594 : 483 - 490