Microarray Gene Expression Data Classification via Wilcoxon Sign Rank Sum and Novel Grey Wolf Optimized Ensemble Learning Models

被引:6
作者
Saheed, Yakub K. [1 ]
Balogun, Bukola F. [2 ]
Odunayo, Braimah Joseph [3 ]
Abdulsalam, Mustapha [4 ]
机构
[1] Amer Univ Nigeria, Sch Informat Technol & Comp, Yola 2250, Nigeria
[2] Kwara State Univ, Malete 241103, Kwara, Nigeria
[3] Univ Free State, Dept Math Stat & Actuarial Sci, ZA-9301 Bloemfontein, South Africa
[4] Skyline Univ, Dept Microbiol, Kano 700103, Nigeria
关键词
Cancer; Feature extraction; Gene expression; Tumors; Colon; Prediction algorithms; Ensemble learning; Colon cancer; ensemble learning; feature selection; machine learning; microarray; wilcoxon sign rank sum; Xgboost; FEATURE-SELECTION; IDENTIFICATION; ALGORITHM; MACHINE;
D O I
10.1109/TCBB.2023.3305429
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Cancer is a deadly disease that affects the lives of people all over the world. Finding a few genes relevant to a single cancer disease can lead to effective treatments. The difficulty with microarray datasets is their high dimensionality; they have a large number of features in comparison to the small number of samples in these datasets. Additionally, microarray data typically exhibit significant asymmetry in dimensionality as well as high levels of redundancy and noise. It is widely held that the majority of genes lack informative value about the classes under study. Recent research has attempted to reduce this high dimensionality by employing various feature selection techniques. This paper presents new ensemble feature selection techniques via the Wilcoxon Sign Rank Sum test (WCSRS) and the Fisher's test (F-test). In the first phase of the experiment, data preprocessing was performed; subsequently, feature selection was performed via the WCSRS and F-test in such a way that the (probability values) p-values of the WCRSR and F-test were adopted for cancerous gene identification. The extracted gene set was used to classify cancer patients using ensemble learning models (ELM), random forest (RF), extreme gradient boosting (Xgboost), cat boost, and Adaboost. To boost the performance of the ELM, we optimized the parameters of all the ELMs using the Grey Wolf optimizer (GWO). The experimental analysis was performed on colon cancer, which included 2000 genes from 62 patients (40 malignant and 22 benign). Using a WCSRS test for feature selection, the optimized Xgboost demonstrated 100% accuracy. The optimized cat boost, on the other hand, demonstrated 100% accuracy using the F-test for feature selection. This represents a 15% improvement over previously reported values in the literature.
引用
收藏
页码:3575 / 3587
页数:13
相关论文
共 58 条
  • [1] AlMazrua H., 2023, PROC INT C DEEP LEAR, P53, DOI [10.1007/978-3-031-16035-6_5, DOI 10.1007/978-3-031-16035-6_5]
  • [2] Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators
    Alomari, Osama Ahmad
    Makhadmeh, Sharif Naser
    Al-Betar, Mohammed Azmi
    Alyasseri, Zaid Abdi Alkareem
    Abu Doush, Iyad
    Abasi, Ammar Kamal
    Awadallah, Mohammed A.
    Abu Zitar, Raed
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 223
  • [3] Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets
    Alrefai, Nashat
    Ibrahim, Othman
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16) : 13513 - 13528
  • [4] Balogun B. F., 2021, LNCS, V2957
  • [5] Bhola A., 2015, Mach. Learn. Appl. An Int. J., V2, P1, DOI [10.5121/mlaij.2015.2401, DOI 10.5121/MLAIJ.2015.2401]
  • [6] [Anonymous], 2020, CA Cancer J Clin, V70, P313, DOI [10.3322/caac.21492, 10.3322/caac.21609]
  • [7] Glutathione S-transferase copy number variation alters lung gene expression
    Butler, M. W.
    Hackett, N. R.
    Salit, J.
    Strulovici-Barel, Y.
    Omberg, L.
    Mezey, J.
    Crystal, R. G.
    [J]. EUROPEAN RESPIRATORY JOURNAL, 2011, 38 (01) : 15 - 28
  • [8] Data augmentation for cancer classification in oncogenomics: an improved KNN based approach
    Chaudhari, Poonam
    Agarwal, Himanshu
    Bhateja, Vikrant
    [J]. EVOLUTIONARY INTELLIGENCE, 2021, 14 (02) : 489 - 498
  • [9] Slime mould algorithm: a comprehensive review of recent variants and applications
    Chen, Huiling
    Li, Chenyang
    Mafarja, Majdi
    Heidari, Ali Asghar
    Chen, Yi
    Cai, Zhennao
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2023, 54 (01) : 204 - 235
  • [10] A hybrid feature selection method for DNA microarray data
    Chuang, Li-Yeh
    Yang, Cheng-Huei
    Wu, Kuo-Chuan
    Yang, Cheng-Hong
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2011, 41 (04) : 228 - 237