A novel bio-inspired hybrid multi-filter wrapper gene selection method with ensemble classifier for microarray data

被引:17
作者
Nouri-Moghaddam, Babak [1 ]
Ghazanfari, Mehdi [1 ]
Fathian, Mohammad [1 ]
机构
[1] Iran Univ Sci & Technol, Dept Ind Engn, Tehran 1684613114, Iran
基金
英国科研创新办公室;
关键词
Gene selection; DNA microarray data; Hybrid method; Multi-filter; Multi-objective wrapper; Forest optimization algorithm; Ensemble classification; PARTICLE SWARM OPTIMIZATION; MULTIOBJECTIVE FEATURE-SELECTION; EXTREME LEARNING-MACHINE; EXPRESSION DATA; MARKOV BLANKET; HARMONY SEARCH; ALGORITHM; GA;
D O I
10.1007/s00521-021-06459-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Microarray technology is known as one of the most important tools for collecting DNA expression data. This technology allows researchers to investigate and examine types of diseases and their origins. However, microarray data are often associated with a small sample size, a significant number of genes, imbalanced data, etc., making classification models inefficient. Thus, a new hybrid solution based on a multi-filter and adaptive chaotic multi-objective forest optimization algorithm (AC-MOFOA) is presented to solve the gene selection problem and construct the Ensemble Classifier. In the proposed solution, a multi-filter model (i.e., ensemble filter) is proposed as preprocessing step to reduce the dataset's dimensions, using a combination of five filter methods to remove redundant and irrelevant genes. Accordingly, the results of the five filter methods are combined using a voting-based function. Additionally, the results of the proposed multi-filter indicate that it has good capability in reducing the gene subset size and selecting relevant genes. Then, an AC-MOFOA based on the concepts of non-dominated sorting, crowding distance, chaos theory, and adaptive operators is presented. AC-MOFOA as a wrapper method aimed at reducing dataset dimensions, optimizing KELM, and increasing the accuracy of the classification, simultaneously. Next, in this method, an ensemble classifier model is presented using AC-MOFOA results to classify microarray data. The performance of the proposed algorithm was evaluated on nine public microarray datasets, and its results were compared in terms of the number of selected genes, classification efficiency, execution time, time complexity, hypervolume indicator, and spacing metric with five hybrid multi-objective methods, and three hybrid single-objective methods. According to the results, the proposed hybrid method could increase the accuracy of the KELM in most datasets by reducing the dataset's dimensions and achieve similar or superior performance compared to other multi-objective methods. Furthermore, the proposed Ensemble Classifier model could provide better classification accuracy and generalizability in the seven of nine microarray datasets compared to conventional ensemble methods. Moreover, the comparison results of the Ensemble Classifier model with three state-of-the-art ensemble generation methods indicate its competitive performance in which the proposed ensemble model achieved better results in the five of nine datasets.
引用
收藏
页码:11531 / 11561
页数:31
相关论文
共 110 条
[1]   A Genetic Algorithm Approach for Prediction of Linear Dynamical Systems [J].
Abo-Hammour, Za'er ;
Alsmadi, Othman ;
Momani, Shaher ;
Abu Arqub, Omar .
MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013
[2]   An Optimization Algorithm for Solving Systems of Singular Boundary Value Problems [J].
Abo-Hammour, Zaer ;
Abu Arqub, Omar ;
Alsmadi, Othman ;
Momani, Shaher ;
Alsaedi, Ahmed .
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2014, 8 (06) :2809-2821
[3]   Solving Singular Two-Point Boundary Value Problems Using Continuous Genetic Algorithm [J].
Abu Arqub, Omar ;
Abo-Hammour, Zaer ;
Momani, Shaher ;
Shawagfeh, Nabil .
ABSTRACT AND APPLIED ANALYSIS, 2012,
[4]  
Agarwalla P., 2018, MULTIOBJECTIVE OPTIM, P195, DOI DOI 10.1007/978-981-13-1471-1_9
[5]   A novel blood pressure estimation method based on the classification of oscillometric waveforms using machine-learning methods [J].
Alghamdi, Ahmed S. ;
Polat, Kemal ;
Alghoson, Abdullah ;
Alshdadi, Abdulrahman A. ;
Abd El-Latif, Ahmed A. .
APPLIED ACOUSTICS, 2020, 164
[6]   A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification [J].
Almugren, Nada ;
Alshamlan, Hala .
IEEE ACCESS, 2019, 7 :78533-78548
[7]   On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems [J].
Amaldi, E ;
Kann, V .
THEORETICAL COMPUTER SCIENCE, 1998, 209 (1-2) :237-260
[8]   Enriched random forests [J].
Amaratunga, Dhammika ;
Cabrera, Javier ;
Lee, Yung-Seop .
BIOINFORMATICS, 2008, 24 (18) :2010-2014
[9]   Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism [J].
Amoozegar, Maryam ;
Minaei-Bidgoli, Behrouz .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 113 :499-514
[10]   CANCER MICROARRAY DATA FEATURE SELECTION USING MULTI-OBJECTIVE BINARY PARTICLE SWARM OPTIMIZATION ALGORITHM [J].
Annavarapu, Chandra Sekhara Rao ;
Dara, Suresh ;
Banka, Haider .
EXCLI JOURNAL, 2016, 15 :460-473