Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features

被引:15
作者
Williamson, Sheldon [1 ]
Vijayakumar, K. [2 ]
Kadam, Vinod J. [3 ]
机构
[1] OntarioTech Univ, Fac Engn & Appl Sci, Oshawa, ON, Canada
[2] St Josephs Inst Technol, Dept Comp Sci & Engn, OMR, Chennai, Tamil Nadu, India
[3] Dr Babasaheb Ambedkar Technol Univ, Dept Informat Technol, Lonere, Maharashtra, India
关键词
Breast cancer; Biopsy; BI-RADS; Random forests; Chi-square test; Mutual information; NEURAL-NETWORK; REASONING CLASSIFIER; SYSTEM;
D O I
10.1007/s11042-021-11114-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To look for early breast cancer signs and indications, mammography screening is one of the best approaches available. Screening mammograms are the most commonly recognized procedure and remain the gold standard for early breast cancer screening. But many times, a relatively low positive predictive rate of breast biopsy demonstrated by this diagnostic technique leads to unneeded biopsies for abnormal findings that are ultimately proven benign in many cases. Random Forest (RF)-which evolves from Decision Trees (DTs)-is one of the most practical and powerful ensemble learning concepts (or meta estimators). Breast Imaging Reporting and Data System (BI-RADS) is developed as a standardized system or tool for reporting breast mammograms. This technique is used to locate unusual findings into groups. In this study, the RF classifier with Chi-Square (chi(2)) and Mutual Information (MI) procedures of relevant Feature Selection (FS) has been applied successfully, in an attempt to predict cancer biopsy outcomes from BI-RAD findings and the patient's age. For validation purposes, the UCI Mammographic Mass dataset has been used and assessed using accuracy, AUC, and several other performance criteria through a 10-fold CV approach. The prediction findings from the proposed method were very encouraging (84.70% accuracy and AUC 0.9023). Similarly, the proposed system gave better results in terms of MCC and F1-score. The results were directly compared with the RF classifiers and other state-of-the-art classification methods. This comparative analysis indicates that the proposed model is superior in terms of various efficiency indicators to the RF classifier and all standard models used in the study. These findings also confirm that the chi(2) and MI FS approaches correctly as well as efficiently obtained the relevant and discriminating feature subset. The result also points out that the suggested approach is a comparable approach to different classification models present in the relevant literature. It is an advantageous, practical, and sound method to predict cancer biopsy outcomes.
引用
收藏
页码:36869 / 36889
页数:21
相关论文
共 58 条
  • [1] A. C. of Radiology (ACR), 2003, BREAST IMAGING REPOR
  • [2] Shape quantization and recognition with randomized trees
    Amit, Y
    Geman, D
    [J]. NEURAL COMPUTATION, 1997, 9 (07) : 1545 - 1588
  • [3] BREAST-CANCER - PREDICTION WITH ARTIFICIAL NEURAL-NETWORK-BASED ON BI-RADS STANDARDIZED LEXICON
    BAKER, JA
    KORNGUTH, PJ
    LO, JY
    WILLIFORD, ME
    FLOYD, CE
    [J]. RADIOLOGY, 1995, 196 (03) : 817 - 822
  • [4] Bakirarar, 2019, TURKIYE KLIN J BIOST, V11
  • [5] Bethapudi P., 2015, INT J COMPUTER APPL, V975, P8887
  • [6] Bhat VH, 2011, COMM COM INF SC, V192, P522
  • [7] Investigating different similarity measures for a case-based reasoning classifier to predict breast cancer
    Bilska, AO
    Floyd, CE
    [J]. MEDICAL IMAGING: 2001: IMAGE PROCESSING, PTS 1-3, 2001, 4322 : 1862 - 1866
  • [8] Development and evaluation of a case-based reasoning classifier for prediction of breast biopsy outcome with BI-RADS™ lexicon
    Bilska-Wolak, AO
    Floyd, CE
    [J]. MEDICAL PHYSICS, 2002, 29 (09) : 2090 - 2100
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32