Development of Symbolic Expressions Ensemble for Breast Cancer Type Classification Using Genetic Programming Symbolic Classifier and Decision Tree Classifier

被引:6
|
作者
Andelic, Nikola [1 ]
Baressi Segota, Sandi [1 ]
机构
[1] Univ Rijeka, Fac Engn, Dept Automat & Elect, Vukovarska 58, Rijeka 51000, Croatia
关键词
breast cancer; genetic programming symbolic classifier; 5-fold cross validation; random hyperparameter value search; FEATURE-SELECTION ALGORITHM; OPTIMIZATION;
D O I
10.3390/cancers15133411
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Breast cancer is a type of cancer with several sub-types and correct sub-type classification based on a large number of gene expressions is challenging even for artificial intelligence. However, the accurate classification of breast cancer in a patient is mandatory for the application of proper treatment. To obtain the equations that can be used for accurate classification of breast cancer sub-type the genetic programming symbolic classifier was utilized. A large number of input variables (gene expressions) were reduced using principle component analysis and the imbalance between class samples was solved using different oversampling methods. The proposed procedure generated equations that can classify breast cancer sub-types with high classification accuracy which was slightly improved with the application of the decision tree classifier method. Breast cancer is a type of cancer with several sub-types. It occurs when cells in breast tissue grow out of control. The accurate sub-type classification of a patient diagnosed with breast cancer is mandatory for the application of proper treatment. Breast cancer classification based on gene expression is challenging even for artificial intelligence (AI) due to the large number of gene expressions. The idea in this paper is to utilize the genetic programming symbolic classifier (GPSC) on the publicly available dataset to obtain a set of symbolic expressions (SEs) that can classify the breast cancer sub-type using gene expressions with high classification accuracy. The initial problem with the used dataset is a large number of input variables (54,676 gene expressions), a small number of dataset samples (151 samples), and six classes of breast cancer sub-types that are highly imbalanced. The large number of input variables is solved with principal component analysis (PCA), while the small number of samples and the large imbalance between class samples are solved with the application of different oversampling methods generating different dataset variations. On each oversampled dataset, the GPSC with random hyperparameter values search (RHVS) method is trained using 5-fold cross validation (5CV) to obtain a set of SEs. The best set of SEs is chosen based on mean values of accuracy (ACC), the area under the receiving operating characteristic curve (AUC), precision, recall, and F1-score values. In this case, the highest classification accuracy is equal to 0.992 across all evaluation metric methods. The best set of SEs is additionally combined with a decision tree classifier, which slightly improves ACC to 0.994.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Breast cancer prediction based on neural networks and extra tree classifier using feature ensemble learning
    Sharma D.
    Kumar R.
    Jain A.
    Measurement: Sensors, 2022, 24
  • [22] Improvement of Malicious Software Detection Accuracy through Genetic Programming Symbolic Classifier with Application of Dataset Oversampling Techniques
    Andelic, Nikola
    Segota, Sandi Baressi
    Car, Zlatan
    COMPUTERS, 2023, 12 (12)
  • [23] A Framework for Optimization of Genetic Programming Evolved Classifier Expressions Using Particle Swarm Optimization
    Jabeen, Hajira
    Baig, Abdul Rauf
    HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, PT 1, 2010, 6076 : 56 - 63
  • [24] Automated Identification of Breast Cancer Type Using Novel Multipath Transfer Learning and Ensemble of Classifier
    Nair, Salini Sasidharan
    Subaji, Mohan
    IEEE ACCESS, 2024, 12 : 87560 - 87578
  • [25] On the Detection of Community Smells Using Genetic Programming-based Ensemble Classifier Chain
    Almarimi, Nuri
    Ouni, Ali
    Chouchen, Moataz
    Saidani, Islem
    Mkaouer, Mohamed Wiem
    2020 ACM/IEEE 15TH INTERNATIONAL CONFERENCE ON GLOBAL SOFTWARE ENGINEERING, ICGSE, 2020, : 43 - 54
  • [26] Co-Evolving Ensemble of Genetic Algorithm Classifier for Cancer Microarray Data Classification
    Hengpraprohm, Supoj
    Hengpraprohm, Kairung
    Thammasiri, Dech
    Mukviboonchai, Suvimol
    ADVANCED SCIENCE LETTERS, 2018, 24 (02) : 1330 - 1333
  • [27] Cardiac arrhythmia classification using sequential feature selection and decision tree classifier method
    Durga S.
    Daniel E.
    Deepa Kanmani S.
    Philip J.M.
    International Journal of Innovative Computing and Applications, 2021, 12 (04) : 175 - 182
  • [28] Classification of Breast Cancer using Ensemble Filter Feature Selection with Triplet Attention Based Efficient Net Classifier
    Madhukar, Bangalore Nagaraj
    Bharathi, Shivanandamurthy Hiremath
    Ashwin, Matta Polnaya
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (01) : 17 - 31
  • [29] Classifying Breast Cancer Microscopic Images using Fractal Dimension and Ensemble Classifier
    Jitaree, S.
    Windeatt, T.
    Boonyapiphat, P.
    Phukpattaranont, P.
    2017 10TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2017,
  • [30] Breast cancer diagnosis using modified Xception and stacked generalization ensemble classifier
    Deb S.D.
    Rahman A.
    Jha R.K.
    Research on Biomedical Engineering, 2023, 39 (04) : 937 - 947