Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method

被引:57
作者
Alfian, Ganjar [1 ]
Syafrudin, Muhammad [2 ]
Fahrurrozi, Imam [1 ]
Fitriyani, Norma Latif [3 ]
Atmaji, Fransiskus Tatas Dwi [4 ]
Widodo, Tri [5 ]
Bahiyah, Nurul [6 ]
Benes, Filip [7 ]
Rhee, Jongtae [8 ]
机构
[1] Univ Gadjah Mada, Vocat Coll, Dept Elect Engn & Informat, Yogyakarta 55281, Indonesia
[2] Sejong Univ, Dept Artificial Intelligence, Seoul 05006, South Korea
[3] Sejong Univ, Dept Data Sci, Seoul 05006, South Korea
[4] Telkom Univ, Ind & Syst Engn Sch, Bandung 40257, Indonesia
[5] Univ Teknol Yogyakarta, Dept Informat Technol Educ, Yogyakarta 55285, Indonesia
[6] IAIN Syekh Nurjati, Jurusan Ilmu Al Quran & Tafsir, Cirebon 45132, Indonesia
[7] VSB Tech Univ Ostrava, Fac Min & Geol, Dept Econ & Control Syst, Ostrava 70800, Czech Republic
[8] Dongguk Univ, Dept Ind & Syst Engn, Seoul 04620, South Korea
关键词
breast cancer; support vector machine; extra-trees; risk factors; MODEL;
D O I
10.3390/computers11090136
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Developing a prediction model from risk factors can provide an efficient method to recognize breast cancer. Machine learning (ML) algorithms have been applied to increase the efficiency of diagnosis at the early stage. This paper studies a support vector machine (SVM) combined with an extremely randomized trees classifier (extra-trees) to provide a diagnosis of breast cancer at the early stage based on risk factors. The extra-trees classifier was used to remove irrelevant features, while SVM was utilized to diagnose the breast cancer status. A breast cancer dataset consisting of 116 subjects was utilized by machine learning models to predict breast cancer, while the stratified 10-fold cross-validation was employed for the model evaluation. Our proposed combined SVM and extra-trees model reached the highest accuracy up to 80.23%, which was significantly better than the other ML model. The experimental results demonstrated that by applying extra-trees-based feature selection, the average ML prediction accuracy was improved by up to 7.29% as contrasted to ML without the feature selection method. Our proposed model is expected to increase the efficiency of breast cancer diagnosis based on risk factors. In addition, we presented the proposed prediction model that could be employed for web-based breast cancer prediction. The proposed model is expected to improve diagnostic decision-support systems by predicting breast cancer disease accurately.
引用
收藏
页数:14
相关论文
共 37 条
  • [21] Hortobagyi Gabriel N, 2005, Clin Breast Cancer, V6, P391, DOI 10.3816/CBC.2005.n.043
  • [22] Using AUC and accuracy in evaluating learning algorithms
    Huang, J
    Ling, CX
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (03) : 299 - 310
  • [23] Khatun Tania, 2021, 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), P1426, DOI 10.1109/ICIRCA51532.2021.9544879
  • [24] Applicability of two violence risk assessment tools in a psychiatric prison hospital population
    Krebs, Julia
    Negatsch, Vincent
    Berg, Inga
    Aigner, Annette
    Opitz-Welke, Annette
    Seidel, Peter
    Konrad, Norbert
    Voulgaris, Alexander
    [J]. BEHAVIORAL SCIENCES & THE LAW, 2020, 38 (05) : 471 - 481
  • [25] An enhanced Predictive heterogeneous ensemble model for breast cancer prediction
    Nanglia, S.
    Ahmad, Muneer
    Khan, Fawad Ali
    Jhanjhi, N. Z.
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 72
  • [26] Automated Paraphrase Quality Assessment Using Language Models and Transfer Learning
    Nicula, Bogdan
    Dascalu, Mihai
    Newton, Natalie N.
    Orcutt, Ellen
    McNamara, Danielle S.
    [J]. COMPUTERS, 2021, 10 (12)
  • [27] Using Resistin, glucose, age and BMI to predict the presence of breast cancer
    Patricio, Miguel
    Pereira, Jose
    Crisostomo, Joana
    Matafome, Paulo
    Gomes, Manuel
    Seica, Raquel
    Caramelo, Francisco
    [J]. BMC CANCER, 2018, 18
  • [28] Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
  • [29] Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features
    Rahman, M. M.
    Ghasemi, Y.
    Suley, E.
    Zhou, Y.
    Wang, S.
    Rogers, J.
    [J]. IRBM, 2021, 42 (04) : 215 - 226
  • [30] Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis
    Rasool, Abdur
    Bunterngchit, Chayut
    Tiejian, Luo
    Islam, Md Ruhul
    Qu, Qiang
    Jiang, Qingshan
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (06)