Breast Cancer Detection from Imbalanced Clinical Data: A Comparative Study of Sampling Methods

被引:0
作者
Bahrami, Mahsa [1 ]
Vali, Mansour [1 ]
Kia, Hanif [1 ]
机构
[1] KN Toosi Univ Technol, Fac Elect Engn, Dept Biomed Engn, Tehran, Iran
来源
2023 30TH NATIONAL AND 8TH INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING, ICBME | 2023年
关键词
Breast cancer; clinical data; machine learning; sampling methods; WDBC; FEATURE-EXTRACTION; CLASSIFICATION; SVM; DIAGNOSIS;
D O I
10.1109/ICBME61513.2023.10488624
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
accurately detecting breast cancer presents a distinctive opportunity for addressing and managing its associated side effects. Collecting patient data is often costly, resulting in imbalanced clinical data, which poses significant challenges for machine learning algorithms. In this paper, we provide a comparative analysis of sampling methods for breast cancer detection. We initially pre-processed clinical data, followed by a comparison of various sampling methods to balance the data. Subsequently, we utilized a support vector machine (SVM) for the classification of malignant and benign breast cancer. Random over-sampling, synthetic minority oversampling technique (SMOTE), borderline SMOTE, K-means SMOTE, adaptive synthetic sampling, under-sampling majority class, Edited Nearest Neighbor (ENN), Repeated Edited Nearest Neighbor (RENN), near miss, Tomeklink, SMOTEEN, and SMOTETomek were compared and evaluated for Breast Cancer Detection (BCD) from imbalanced clinical data. We also computed feature importance with the eXtreme gradient boosting method that offers an exclusive chance for pathologists in the data processing. We validated a comprehensive examination of BCD through a dataset comprising 569 recordings from the Wisconsin Diagnostic Breast Cancer Data. The best performance was achieved by SMOTEEN, where the accuracy, sensitivity, and specificity were 98.4%, 97.6%, and 98.9%, respectively. It was also found that the mean and worse concave points were more important features for BCD in the WDBC dataset.
引用
收藏
页码:145 / 149
页数:5
相关论文
共 29 条
  • [1] A new nested ensemble technique for automated diagnosis of breast cancer
    Abdar, Moloud
    Zomorodi-Moghadam, Mariam
    Zhou, Xujuan
    Gururajan, Raj
    Tao, Xiaohui
    Barua, Prabal D.
    Gururajan, Rashmi
    [J]. PATTERN RECOGNITION LETTERS, 2020, 132 : 123 - 131
  • [2] Breast cancer classification using deep belief networks
    Abdel-Zaher, Ahmed M.
    Eldeib, Ayman M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 46 : 139 - 144
  • [3] Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach
    Alshayeji, Mohammad H.
    Ellethy, Hanem
    Abed, Saed
    Gupta, Renu
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
  • [4] Mammography screening: A major issue in medicine
    Autier, Philippe
    Boniol, Mathieu
    [J]. EUROPEAN JOURNAL OF CANCER, 2018, 90 : 34 - 62
  • [5] Bahrami Mahsa, 2021, 2021 28th National and 6th International Iranian Conference on Biomedical Engineering (ICBME), P160, DOI 10.1109/ICBME54433.2021.9750287
  • [6] Batista G.E., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [10.1145/1007730.1007735, 10.1145/1007730.1007735.2, DOI 10.1145/1007730.1007735]
  • [7] breastcancer, about us
  • [8] A comprehensive data level analysis for cancer diagnosis on imbalanced data
    Fotouhi, Sara
    Asadi, Shahrokh
    Kattan, Michael W.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 90
  • [9] Gupta M, 2018, PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), P997, DOI 10.1109/ICCMC.2018.8487537
  • [10] Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    Han, H
    Wang, WY
    Mao, BH
    [J]. ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 878 - 887