A multi-objective optimization algorithm for gene selection and classification in cancer study

被引:1
作者
Banjoko, Alabi W. [1 ]
Yahya, Waheed B. [1 ]
Olaniran, Oyebayo R. [1 ]
机构
[1] Univ Ilorin, Dept Stat, Ilorin, Kwara, Nigeria
关键词
Feature selection; Multi-Objective optimization; Sequential process; Optimal gene subset microarray dataset; MICROARRAY DATA; INFORMATIVE GENES; CLASS PREDICTION; EXPRESSION DATA; REGRESSION; MACHINE; HYBRID; DISCOVERY; NETWORK;
D O I
10.1016/j.asoc.2025.112911
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, non-clinical diagnostics and predictive modeling in statistical genomics have gained increased attention, particularly in the analysis of microarray gene expression data. One of the key advantages of using microarray data is its ability to analyze the expression levels of thousands of genes simultaneously. A key challenge remains the efficient identification of a small subset of informative genes that are statistically correlated with specific groups of Messenger Ribonucleic Acid (mRNA) tissue samples, followed by a meaningful biological interpretation of the findings. This paper introduces a two-stage hybrid multi-objective optimization (MOO) algorithm for feature selection and classification of mRNA samples into their respective groups. The proposed method enhances the classification performance of Support Vector Machines (SVMs) by framing gene selection as a MOO problem. Initially, a filter method-using either the t-test or F-test-was employed to eliminate noisy genes, which are prevalent in microarray gene expression data. Subsequently, the genes selected by the filter methods were further refined through a MOO approach, applying Pareto optimality criteria to identify a minimal yet optimal gene subset. The effectiveness of the proposed method was evaluated using both simulated and published high-dimensional microarray datasets, considering out-of-bag (OOB) accuracy, misclassification error rates, and several other performance metrics. The results demonstrated that the proposed method is robust, achieving high prediction accuracy with minimal gene subsets. Furthermore, it outperformed existing methods in its ability to select a small number of gene biomarkers that are strongly correlated with the biological response class.
引用
收藏
页数:15
相关论文
共 69 条
[1]   Deep Learning-Based Prediction of Alzheimer's Disease Using Microarray Gene Expression Data [J].
Abdelwahab, Mahmoud M. ;
Al-Karawi, Khamis A. ;
Semary, Hatem E. ;
Gulyaeva, Natalia V. .
BIOMEDICINES, 2023, 11 (12)
[2]   Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size [J].
Ajana, Soufiane ;
Acar, Niyazi ;
Bretillon, Lionel ;
Hejblum, Boris P. ;
Jacqmin-Gadda, Helene ;
Delcourt, Cecile ;
Berdeaux, Olivier ;
Bouton, Sylvain ;
Bron, Alain ;
Buaud, Benjamin ;
Cabaret, Stephanie ;
Cougnard-Gregorie, Audrey ;
Creuzot-Garcher, Catherine ;
Delyfer, Marie-Noelle ;
Feart-Couret, Catherine ;
Febvret, Valerie ;
Gregoire, Stephane ;
He, Zhiguo ;
Korobelnik, Jean-Francois ;
Martine, Lucy ;
Merle, Benedicte ;
Vaysse, Carole .
BIOINFORMATICS, 2019, 35 (19) :3628-3634
[3]   A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data [J].
Aziz, Rabia ;
Verma, C. K. ;
Srivastava, Namita .
GENOMICS DATA, 2016, 8 :4-15
[4]  
Balogh EP, 2015, IMPROVING DIAGNOSIS IN HEALTH CARE, P1, DOI 10.17226/21794
[5]  
Banjoko A., 2015, Ann. Comput. Sci., V13, P69
[6]   Weighted support vector machine algorithm for efficient classification and prediction of binary response data [J].
Banjoko, A. W. ;
Yahya, W. B. ;
Garba, M. K. ;
Abdulazeez, K. O. .
2ND INTERNATIONAL CONFERENCE ON APPLIED & INDUSTRIAL MATHEMATICS AND STATISTICS, 2019, 1366
[7]  
Banjoko A.W., 2020, Multiclass Response Feature Selection and Cancer Tumour Classification With Support Vector Machine
[8]  
Banjoko A.W., 2019, J. Biostat. Epidemiol., V5, P91
[9]  
Banjoko A.W., 2020, Turk. Klin. J. Biostat., V12
[10]   Efficient Data-Mining Algorithm for Predicting Heart Disease Based on an Angiographic Test [J].
Banjoko, Alabi Waheed ;
Abdulazeez, Kawthar Opeyemi .
MALAYSIAN JOURNAL OF MEDICAL SCIENCES, 2021, 28 (05) :118-129