MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification

被引:40
作者
Peng, Cheng [1 ,2 ]
Wu, Xinyu [1 ]
Yuan, Wen [1 ]
Zhang, Xinran [1 ]
Zhang, Yu [1 ]
Li, Ying [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Jilin, Peoples R China
[2] Tsinghua Univ, Sch Software, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Microwave integrated circuits; Genetic algorithms; Cancer; Nonhomogeneous media; Gene expression; Heuristic algorithms; Gene selection; genetic algorithm; recursive feature elimination; microarray data; cancer classification;
D O I
10.1109/TCBB.2019.2921961
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Microarray gene expression data have become a topic of great interest for cancer classification and for further research in the field of bioinformatics. Nonetheless, due to the "large p, small n" paradigm of limited biosamples and high-dimensional data, gene selection is becoming a demanding task, which is aimed at selecting a minimal number of discriminatory genes associated closely with a phenotype. Feature or gene selection is still a challenging problem owing to its nondeterministic polynomial time complexity and thus most of the existing feature selection algorithms utilize heuristic rules. A multilayer recursive feature elimination method based on an embedded integer-coded genetic algorithm, MGRFE, is proposed here, which is aimed at selecting the gene combination with minimal size and maximal information. On the basis of 19 benchmark microarray datasets including multiclass and imbalanced datasets, MGRFE outperforms state-of-the-art feature selection algorithms with better cancer classification accuracy and a smaller selected gene number. MGRFE could be regarded as a promising feature selection method for high-dimensional datasets especially gene expression data. Moreover, the genes selected by MGRFE have close biological relevance to cancer phenotypes. The source code of our proposed algorithm and all the 19 datasets used in this paper are available at https://github.com/Pengeace/MGRFE-GaRFE.
引用
收藏
页码:621 / 632
页数:12
相关论文
共 65 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [3] Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification
    Alshamlan, Hala M.
    Badr, Ghada H.
    Alohali, Yousef A.
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2015, 56 : 49 - 60
  • [4] MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia
    Armstrong, SA
    Staunton, JE
    Silverman, LB
    Pieters, R
    de Boer, ML
    Minden, MD
    Sallan, SE
    Lander, ES
    Golub, TR
    Korsmeyer, SJ
    [J]. NATURE GENETICS, 2002, 30 (01) : 41 - 47
  • [5] A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes
    Baldi, P
    Long, AD
    [J]. BIOINFORMATICS, 2001, 17 (06) : 509 - 519
  • [6] Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data
    Bhattacharyya, C
    Grate, LR
    Rizki, A
    Radisky, D
    Molina, FJ
    Jordan, MI
    Bissell, MJ
    Mian, IS
    [J]. SIGNAL PROCESSING, 2003, 83 (04) : 729 - 743
  • [7] Blickle T., 1995, A Comparison of Selection Schemes used in Genetic Algorithms
  • [8] Robust approach for estimating probabilities in Naive-Bayes Classifier for gene expression data
    Chandra, B.
    Gupta, Manish
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 1293 - 1298
  • [9] Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm
    Chen, Kun-Huang
    Wang, Kung-Jeng
    Tsai, Min-Lung
    Wang, Kung-Min
    Adrian, Angelia Melani
    Cheng, Wei-Chung
    Yang, Tzu-Sen
    Teng, Nai-Chia
    Tan, Kuo-Pin
    Chang, Ku-Shang
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [10] Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival
    Chiaretti, S
    Li, XC
    Gentleman, R
    Vitale, A
    Vignetti, M
    Mandelli, F
    Ritz, J
    Foa, R
    [J]. BLOOD, 2004, 103 (07) : 2771 - 2778