Development of Symbolic Expressions Ensemble for Breast Cancer Type Classification Using Genetic Programming Symbolic Classifier and Decision Tree Classifier

被引:6
|
作者
Andelic, Nikola [1 ]
Baressi Segota, Sandi [1 ]
机构
[1] Univ Rijeka, Fac Engn, Dept Automat & Elect, Vukovarska 58, Rijeka 51000, Croatia
关键词
breast cancer; genetic programming symbolic classifier; 5-fold cross validation; random hyperparameter value search; FEATURE-SELECTION ALGORITHM; OPTIMIZATION;
D O I
10.3390/cancers15133411
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Breast cancer is a type of cancer with several sub-types and correct sub-type classification based on a large number of gene expressions is challenging even for artificial intelligence. However, the accurate classification of breast cancer in a patient is mandatory for the application of proper treatment. To obtain the equations that can be used for accurate classification of breast cancer sub-type the genetic programming symbolic classifier was utilized. A large number of input variables (gene expressions) were reduced using principle component analysis and the imbalance between class samples was solved using different oversampling methods. The proposed procedure generated equations that can classify breast cancer sub-types with high classification accuracy which was slightly improved with the application of the decision tree classifier method. Breast cancer is a type of cancer with several sub-types. It occurs when cells in breast tissue grow out of control. The accurate sub-type classification of a patient diagnosed with breast cancer is mandatory for the application of proper treatment. Breast cancer classification based on gene expression is challenging even for artificial intelligence (AI) due to the large number of gene expressions. The idea in this paper is to utilize the genetic programming symbolic classifier (GPSC) on the publicly available dataset to obtain a set of symbolic expressions (SEs) that can classify the breast cancer sub-type using gene expressions with high classification accuracy. The initial problem with the used dataset is a large number of input variables (54,676 gene expressions), a small number of dataset samples (151 samples), and six classes of breast cancer sub-types that are highly imbalanced. The large number of input variables is solved with principal component analysis (PCA), while the small number of samples and the large imbalance between class samples are solved with the application of different oversampling methods generating different dataset variations. On each oversampled dataset, the GPSC with random hyperparameter values search (RHVS) method is trained using 5-fold cross validation (5CV) to obtain a set of SEs. The best set of SEs is chosen based on mean values of accuracy (ACC), the area under the receiving operating characteristic curve (AUC), precision, recall, and F1-score values. In this case, the highest classification accuracy is equal to 0.992 across all evaluation metric methods. The best set of SEs is additionally combined with a decision tree classifier, which slightly improves ACC to 0.994.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] The Naive-Bayes Decision Tree (NBTree) Classifier - edicting the Probability of Survival in Breast Cancer
    Al-Allak, A.
    Leonard, R.
    Lewis, P. D.
    CANCER RESEARCH, 2010, 70
  • [42] DEFECT CLASSIFICATION FOR LCD COLOR FILTERS USING NEURAL-NETWORK DECISION TREE CLASSIFIER
    Tseng, Din-Chang
    Chung, I-Ling
    Tsai, Pei-Lin
    Chou, Chang-Min
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (7A): : 3695 - 3707
  • [43] Breast Cancer Classification using Decision Tree Algorithms
    Tarawneh, Omar
    Otair, Mohammed
    Husni, Moath
    Abuaddous, Hayfa Y.
    Tarawneh, Monther
    Almomani, Malek A.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 676 - 680
  • [44] Automated breast cancer detection in mammography using ensemble classifier and feature weighting algorithms
    Yan, Fei
    Huang, Hesheng
    Pedrycz, Witold
    Hirota, Kaoru
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
  • [45] Development of a knowledge based decision tree classifier using hybrid polarimetric SAR observables
    Verma, Nidhi
    Mishra, Pooja
    Purohit, Neetesh
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2020, 41 (04) : 1302 - 1320
  • [46] Mitotic nuclei analysis in breast cancer histopathology images using deep ensemble classifier
    Sohail, Anabia
    Khan, Asifullah
    Nisar, Humaira
    Tabassum, Sobia
    Zameer, Aneela
    MEDICAL IMAGE ANALYSIS, 2021, 72
  • [47] Application of decision tree-based ensemble learning in the classification of breast cancer
    Ghiasi, Mohammad M.
    Zendehboudi, Sohrab
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 128
  • [48] An evolutionary approach for spatial prediction of landslide susceptibility using LiDAR and symbolic classification with genetic programming
    Pece V. Gorsevski
    Natural Hazards, 2021, 108 : 2283 - 2307
  • [49] An evolutionary approach for spatial prediction of landslide susceptibility using LiDAR and symbolic classification with genetic programming
    Gorsevski, Pece V.
    NATURAL HAZARDS, 2021, 108 (02) : 2283 - 2307
  • [50] An efficient classification framework for breast cancer using hyper parameter tuned Random Decision Forest Classifier and Bayesian Optimization
    Kumar, Pratheep P.
    Bai, Mary Amala, V
    Nair, Geetha G.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 68 (68)