Microarray Gene Expression Data Classification via Wilcoxon Sign Rank Sum and Novel Grey Wolf Optimized Ensemble Learning Models

被引:6
作者
Saheed, Yakub K. [1 ]
Balogun, Bukola F. [2 ]
Odunayo, Braimah Joseph [3 ]
Abdulsalam, Mustapha [4 ]
机构
[1] Amer Univ Nigeria, Sch Informat Technol & Comp, Yola 2250, Nigeria
[2] Kwara State Univ, Malete 241103, Kwara, Nigeria
[3] Univ Free State, Dept Math Stat & Actuarial Sci, ZA-9301 Bloemfontein, South Africa
[4] Skyline Univ, Dept Microbiol, Kano 700103, Nigeria
关键词
Cancer; Feature extraction; Gene expression; Tumors; Colon; Prediction algorithms; Ensemble learning; Colon cancer; ensemble learning; feature selection; machine learning; microarray; wilcoxon sign rank sum; Xgboost; FEATURE-SELECTION; IDENTIFICATION; ALGORITHM; MACHINE;
D O I
10.1109/TCBB.2023.3305429
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Cancer is a deadly disease that affects the lives of people all over the world. Finding a few genes relevant to a single cancer disease can lead to effective treatments. The difficulty with microarray datasets is their high dimensionality; they have a large number of features in comparison to the small number of samples in these datasets. Additionally, microarray data typically exhibit significant asymmetry in dimensionality as well as high levels of redundancy and noise. It is widely held that the majority of genes lack informative value about the classes under study. Recent research has attempted to reduce this high dimensionality by employing various feature selection techniques. This paper presents new ensemble feature selection techniques via the Wilcoxon Sign Rank Sum test (WCSRS) and the Fisher's test (F-test). In the first phase of the experiment, data preprocessing was performed; subsequently, feature selection was performed via the WCSRS and F-test in such a way that the (probability values) p-values of the WCRSR and F-test were adopted for cancerous gene identification. The extracted gene set was used to classify cancer patients using ensemble learning models (ELM), random forest (RF), extreme gradient boosting (Xgboost), cat boost, and Adaboost. To boost the performance of the ELM, we optimized the parameters of all the ELMs using the Grey Wolf optimizer (GWO). The experimental analysis was performed on colon cancer, which included 2000 genes from 62 patients (40 malignant and 22 benign). Using a WCSRS test for feature selection, the optimized Xgboost demonstrated 100% accuracy. The optimized cat boost, on the other hand, demonstrated 100% accuracy using the F-test for feature selection. This represents a 15% improvement over previously reported values in the literature.
引用
收藏
页码:3575 / 3587
页数:13
相关论文
共 58 条
[11]   A Study of Metaheuristic Algorithms for High Dimensional Feature Selection on Microarray Data [J].
Dankolo, Muhammad Nasiru ;
Radzi, Nor Haizan Mohamed ;
Sallehuddin, Roselina ;
Mustaffa, Noorfa Haszlinna .
13TH IMT-GT INTERNATIONAL CONFERENCE ON MATHEMATICS, STATISTICS AND THEIR APPLICATIONS (ICMSA2017), 2017, 1905
[13]   Exome sequencing of oral leukoplakia and oral squamous cell carcinoma implicates DNA damage repair gene defects in malignant transformation [J].
Farah, Camile S. ;
Jessri, Maryam ;
Bennett, Nigel C. ;
Dalley, Andrew J. ;
Shearston, Kate D. ;
Fox, Simon A. .
ORAL ONCOLOGY, 2019, 96 :42-50
[14]   Normalization in Unsupervised Segmentation Parameter Optimization: A Solution Based on Local Regression Trend Analysis [J].
Georganos, Stefanos ;
Lennert, Moritz ;
Grippa, Tais ;
Vanhuysse, Sabine ;
Johnson, Brian ;
Wolff, Eleonore .
REMOTE SENSING, 2018, 10 (02)
[15]   Archimedes optimization algorithm: a new metaheuristic algorithm for solving optimization problems [J].
Hashim, Fatma A. ;
Hussain, Kashif ;
Houssein, Essam H. ;
Mabrouk, Mai S. ;
Al-Atabany, Walid .
APPLIED INTELLIGENCE, 2021, 51 (03) :1531-1551
[16]   Henry gas solubility optimization: A novel physics-based algorithm [J].
Hashim, Fatma A. ;
Houssein, Essam H. ;
Mabrouk, Mai S. ;
Al-Atabany, Walid ;
Mirjalili, Seyedali .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 101 :646-667
[17]   Harris hawks optimization: Algorithm and applications [J].
Heidari, Ali Asghar ;
Mirjalili, Seyedali ;
Faris, Hossam ;
Aljarah, Ibrahim ;
Mafarja, Majdi ;
Chen, Huiling .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 97 :849-872
[18]  
Jimoh R. G., 2018, ANALE SERIA INFORM, V6, P55
[19]  
Kamoru J., 2018, PROC 14 ISTEAMS MULT, P159
[20]  
Kononenko I., 1994, EUR C MACH LEARN, P171, DOI [10.1007/3-540-57868-4_57, DOI 10.1007/3-540-57868-4_57]