Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method

被引:57
作者
Alfian, Ganjar [1 ]
Syafrudin, Muhammad [2 ]
Fahrurrozi, Imam [1 ]
Fitriyani, Norma Latif [3 ]
Atmaji, Fransiskus Tatas Dwi [4 ]
Widodo, Tri [5 ]
Bahiyah, Nurul [6 ]
Benes, Filip [7 ]
Rhee, Jongtae [8 ]
机构
[1] Univ Gadjah Mada, Vocat Coll, Dept Elect Engn & Informat, Yogyakarta 55281, Indonesia
[2] Sejong Univ, Dept Artificial Intelligence, Seoul 05006, South Korea
[3] Sejong Univ, Dept Data Sci, Seoul 05006, South Korea
[4] Telkom Univ, Ind & Syst Engn Sch, Bandung 40257, Indonesia
[5] Univ Teknol Yogyakarta, Dept Informat Technol Educ, Yogyakarta 55285, Indonesia
[6] IAIN Syekh Nurjati, Jurusan Ilmu Al Quran & Tafsir, Cirebon 45132, Indonesia
[7] VSB Tech Univ Ostrava, Fac Min & Geol, Dept Econ & Control Syst, Ostrava 70800, Czech Republic
[8] Dongguk Univ, Dept Ind & Syst Engn, Seoul 04620, South Korea
关键词
breast cancer; support vector machine; extra-trees; risk factors; MODEL;
D O I
10.3390/computers11090136
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Developing a prediction model from risk factors can provide an efficient method to recognize breast cancer. Machine learning (ML) algorithms have been applied to increase the efficiency of diagnosis at the early stage. This paper studies a support vector machine (SVM) combined with an extremely randomized trees classifier (extra-trees) to provide a diagnosis of breast cancer at the early stage based on risk factors. The extra-trees classifier was used to remove irrelevant features, while SVM was utilized to diagnose the breast cancer status. A breast cancer dataset consisting of 116 subjects was utilized by machine learning models to predict breast cancer, while the stratified 10-fold cross-validation was employed for the model evaluation. Our proposed combined SVM and extra-trees model reached the highest accuracy up to 80.23%, which was significantly better than the other ML model. The experimental results demonstrated that by applying extra-trees-based feature selection, the average ML prediction accuracy was improved by up to 7.29% as contrasted to ML without the feature selection method. Our proposed model is expected to increase the efficiency of breast cancer diagnosis based on risk factors. In addition, we presented the proposed prediction model that could be employed for web-based breast cancer prediction. The proposed model is expected to improve diagnostic decision-support systems by predicting breast cancer disease accurately.
引用
收藏
页数:14
相关论文
共 37 条
  • [1] Support vector machines combined with feature selection for breast cancer diagnosis
    Akay, Mehmet Fatih
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 3240 - 3247
  • [3] Alfian G., 2021, Artificial Intelligence and Big Data Analytics for Smart Healthcare, P63
  • [4] Deep Neural Network for Predicting Diabetic Retinopathy from Risk Factors
    Alfian, Ganjar
    Syafrudin, Muhammad
    Fitriyani, Norma Latif
    Anshari, Muhammad
    Stasa, Pavel
    Svub, Jiri
    Rhee, Jongtae
    [J]. MATHEMATICS, 2020, 8 (09)
  • [5] A Personalized Healthcare Monitoring System for Diabetic Patients by Utilizing BLE-Based Sensors and Real-Time Data Processing
    Alfian, Ganjar
    Syafrudin, Muhammad
    Ijaz, Muhammad Fazal
    Syaekhoni, M. Alex
    Fitriyani, Norma Latif
    Rhee, Jongtae
    [J]. SENSORS, 2018, 18 (07)
  • [6] Alkabban F.M., 2022, STATPEARLS
  • [7] A wrapper-based feature selection approach to investigate potential biomarkers for early detection of breast cancer
    Alnowami, Majdi R.
    Abolaban, Fouad A.
    Taha, Eslam
    [J]. JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES, 2022, 15 (01) : 104 - 110
  • [8] [Anonymous], Breast Cancer dataset UCI ML reposatory
  • [9] [Anonymous], 2015, Breast Cancer
  • [10] Austria Y., 2019, INT J SIMUL SYST SCI, DOI [10.5013/IJSSST.a.20.S2.23, DOI 10.5013/IJSSST.A.20.S2.23]