A Comparative Analysis of Data Mining Techniques on Breast Cancer Diagnosis Data using WEKA Toolbox

被引:0
作者
Alshammari, Majdah [1 ]
Mezher, Mohammad [1 ]
机构
[1] Fahad Bin Sultan Univ, Dept Comp Sci, Tabuk, Saudi Arabia
关键词
Data mining; breast cancer; data mining techniques; classification; WEKA toolbox;
D O I
10.14569/IJACSA.2020.0110829
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Breast cancer is considered the second most common cancer in women compared to all other cancers. It is fatal in less than half of all cases and is the main cause of mortality in women. It accounts for 16% of all cancer mortalities worldwide. Early diagnosis of breast cancer increases the chance of recovery. Data mining techniques can be utilized in the early diagnosis of breast cancer. In this paper, an academic experimental breast cancer dataset is used to perform a data mining practical experiment using the Waikato Environment for Knowledge Analysis (WEKA) tool. The WEKA Java application represents a rich resource for conducting performance metrics during the execution of experiments. Pre-processing and feature extraction are used to optimize the data. The classification process used in this study was summarized through thirteen experiments. Additionally, 10 experiments using various different classification algorithms were conducted. The introduced algorithms were: Naive Bayes, Logistic Regression, Lazy IBK (Instance-Bases learning with parameter K), Lazy Kstar, Lazy Locally Weighted Learner, Rules ZeroR, Decision Stump, Decision Trees J48, Random Forest and Random Trees. The process of producing a predictive model was automated with the use of classification accuracy. Further, several experiments on classification of Wisconsin Diagnostic Breast Cancer and Wisconsin Breast Cancer, were conducted to compare the success rates of the different methods. Results conclude that Lazy IBK classifier k-NN can achieve 98% accuracy among other classifiers. The main advantages of the study were the compactness of using 13 different data mining models and 10 different performance measurements, and plotting figures of classifications errors.
引用
收藏
页码:224 / 229
页数:6
相关论文
共 17 条
  • [1] A new nested ensemble technique for automated diagnosis of breast cancer
    Abdar, Moloud
    Zomorodi-Moghadam, Mariam
    Zhou, Xujuan
    Gururajan, Raj
    Tao, Xiaohui
    Barua, Prabal D.
    Gururajan, Rashmi
    [J]. PATTERN RECOGNITION LETTERS, 2020, 132 : 123 - 131
  • [2] Breast cancer diagnosis using GA feature selection and Rotation Forest
    Alickovic, Emina
    Subasi, Abdulhamit
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 (04) : 753 - 763
  • [3] [Anonymous], INT J ENG COMPUTER S, V6, P20388, DOI [10.18535/ijecs/v6i2.40., DOI 10.18535/IJECS/V6I2.40]
  • [4] [Anonymous], 1992, BREAST CANC WISCONSI
  • [5] Prediction of benign and malignant breast cancer using data mining techniques
    Chaurasia, Vikas
    Pal, Saurabh
    Tiwari, B. B.
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2018, 12 (02) : 119 - 126
  • [6] Hosaini Sayedeh Somayeh, 2015, INT J ADV RES ELECT, V4, P6815
  • [7] Ibrahim AS., 2015, SECI Oncol, V4, P1, DOI DOI 10.18056/SECI2015.4
  • [8] Kumar V., 2020, LECT NOTES DATA ENG, V37, DOI [10.1007/978-981-15-0978-0_43, DOI 10.1007/978-981-15-0978-0_43]
  • [9] Data mining algorithms to compute mixed concepts with negative attributes: an application to breast cancer data analysis
    Manuel Rodriguez-Jimenez, Jose
    Cordero, Pablo
    Enciso, Manuel
    Mora, Angel
    [J]. MATHEMATICAL METHODS IN THE APPLIED SCIENCES, 2016, 39 (16) : 4829 - 4845
  • [10] Perceived barriers to reporting adverse drug events in hospitals: a qualitative study using theoretical domains framework approach
    Mirbaha, Fariba
    Shalviri, Gloria
    Yazdizadeh, Bahareh
    Gholami, Kheirollah
    Majdzadeh, Reza
    [J]. IMPLEMENTATION SCIENCE, 2015, 10