A multi-objective optimization algorithm for gene selection and classification in cancer study

被引:1
作者
Banjoko, Alabi W. [1 ]
Yahya, Waheed B. [1 ]
Olaniran, Oyebayo R. [1 ]
机构
[1] Univ Ilorin, Dept Stat, Ilorin, Kwara, Nigeria
关键词
Feature selection; Multi-Objective optimization; Sequential process; Optimal gene subset microarray dataset; MICROARRAY DATA; INFORMATIVE GENES; CLASS PREDICTION; EXPRESSION DATA; REGRESSION; MACHINE; HYBRID; DISCOVERY; NETWORK;
D O I
10.1016/j.asoc.2025.112911
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, non-clinical diagnostics and predictive modeling in statistical genomics have gained increased attention, particularly in the analysis of microarray gene expression data. One of the key advantages of using microarray data is its ability to analyze the expression levels of thousands of genes simultaneously. A key challenge remains the efficient identification of a small subset of informative genes that are statistically correlated with specific groups of Messenger Ribonucleic Acid (mRNA) tissue samples, followed by a meaningful biological interpretation of the findings. This paper introduces a two-stage hybrid multi-objective optimization (MOO) algorithm for feature selection and classification of mRNA samples into their respective groups. The proposed method enhances the classification performance of Support Vector Machines (SVMs) by framing gene selection as a MOO problem. Initially, a filter method-using either the t-test or F-test-was employed to eliminate noisy genes, which are prevalent in microarray gene expression data. Subsequently, the genes selected by the filter methods were further refined through a MOO approach, applying Pareto optimality criteria to identify a minimal yet optimal gene subset. The effectiveness of the proposed method was evaluated using both simulated and published high-dimensional microarray datasets, considering out-of-bag (OOB) accuracy, misclassification error rates, and several other performance metrics. The results demonstrated that the proposed method is robust, achieving high prediction accuracy with minimal gene subsets. Furthermore, it outperformed existing methods in its ability to select a small number of gene biomarkers that are strongly correlated with the biological response class.
引用
收藏
页数:15
相关论文
共 69 条
[41]  
Mohamad MS, 2010, UKSIM INT CONF COMP, P158, DOI 10.1109/ISMS.2010.39
[42]  
Mohamad MS, 2010, UKSIM INT CONF COMP, P15, DOI 10.1109/ISMS.2010.14
[43]   Selecting informative genes from microarray data by using hybrid methods for cancer classification [J].
Mohamad M.S. ;
Omatu S. ;
Deris S. ;
Misman M.F. ;
Yoshioka M. .
Artificial Life and Robotics, 2009, 13 (2) :414-417
[44]  
Olaniran OR, 2017, J MOD APPL STAT METH, V16, P618, DOI 10.22237/jmasm/1509496440
[45]  
Pati S.K., 2018, Improved Genetic Algorithm for Selecting Significant Genes in Cancer Diagnosis
[46]   Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification [J].
Rukhsar, Laiqa ;
Bangyal, Waqas Haider ;
Ali Khan, Muhammad Sadiq ;
Ag Ibrahim, Ag Asri ;
Nisar, Kashif ;
Rawat, Danda B. .
APPLIED SCIENCES-BASEL, 2022, 12 (04)
[47]   Data Mining of Gene Expression Data by Fuzzy and Hybrid Fuzzy Methods [J].
Schaefer, Gerald ;
Nakashima, Tomoharu .
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2010, 14 (01) :23-29
[48]   MULTIPLE HYPOTHESIS-TESTING [J].
SHAFFER, JP .
ANNUAL REVIEW OF PSYCHOLOGY, 1995, 46 :561-584
[49]  
Shetty A.M., 2024, Hyperparameter Optimization of Machine Learning Models Using Grid Search for Amazon Review Sentiment Analysis
[50]   Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning [J].
Shipp, MA ;
Ross, KN ;
Tamayo, P ;
Weng, AP ;
Kutok, JL ;
Aguiar, RCT ;
Gaasenbeek, M ;
Angelo, M ;
Reich, M ;
Pinkus, GS ;
Ray, TS ;
Koval, MA ;
Last, KW ;
Norton, A ;
Lister, TA ;
Mesirov, J ;
Neuberg, DS ;
Lander, ES ;
Aster, JC ;
Golub, TR .
NATURE MEDICINE, 2002, 8 (01) :68-74