Gene Selection for Microarray Cancer Classification based on Manta Rays Foraging Optimization and Support Vector Machines

被引:9
作者
Houssein, Essam H. [1 ]
Hassan, Hager N. [1 ]
Al-Sayed, Mustafa M. [1 ]
Nabil, Emad [2 ,3 ]
机构
[1] Minia Univ, Fac Comp & Informat, Al Minya, Egypt
[2] Cairo Univ, Fac Comp & Artificial Intelligence, Giza, Egypt
[3] Islamic Univ Madinah, Fac Comp Sci & Informat Syst, Madinah, Saudi Arabia
关键词
Microarray; Gene expression; Gene selection; Cancer classification; Feature selection; Manta Ray Foraging Optimization algorithm; Support vector machines; Minimum Redundancy Maximum Relevance; PARTICLE SWARM OPTIMIZATION; EFFICIENT FEATURE-SELECTION; FEATURE SUBSET-SELECTION; RANDOM SUBSPACE METHOD; HIGH-DIMENSIONAL DATA; MOLECULAR CLASSIFICATION; MUTUAL INFORMATION; SVM-RFE; ALGORITHM; TUMOR;
D O I
10.1007/s13369-021-06102-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In DNA microarray applications, many techniques are proposed for cancer classification in order to detect normal and cancerous humans or classify different types of cancers. Gene selection is usually required as a preliminary step for a cancer classification problem. This step aims to select the most informative genes among a great number of genes, which represent an important issue. Although many studies have been proposed to address this issue, they lack getting the most informative and fewest number of genes with the highest accuracy and little effort from the high dimensionality of microarray datasets. Manta ray foraging optimization(MRFO) algorithm is a new meta-heuristic algorithm that mimics the nature of manta ray fishes in food foraging. MRFO has achieved promising results in other fields, such as solar generating units. Due to the high accuracy results of the support vector machines (SVM), it is the most commonly used classification algorithm in cancer studies, especially with microarray data. For exploiting the pros of both algorithms (i.e., MRFO and SVM), in this paper, a hybrid algorithm is proposed to select the most predictive and informative genes for cancer classification. A binary microarray dataset, which includes colon and leukemia1, and a multi-class microarray dataset that includes SRBCT, lymphoma, and leukemia2, are used to evaluate the accuracy of the proposed technique. Like other optimization techniques, MRFO suffers from some problems related to the high dimensionality and complexity of the microarray data. For solving such problems as well as improving the performance, the minimum redundancy maximum relevance (mRMR) method is used as a preprocessing stage. The proposed technique has been evaluated compared to the most common cancer classification algorithms. The experimental results show that our proposed technique achieves the highest accuracy with the fewest number of informative genes and little effort.
引用
收藏
页码:2555 / 2572
页数:18
相关论文
共 88 条
  • [1] Automatic detection of erythemato-squamous diseases using PSO-SVM based on association rules
    Abdi, Mohammad Javad
    Giveki, Davar
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (01) : 603 - 608
  • [2] A Novel Weighted Support Vector Machine Based on Particle Swarm Optimization for Gene Selection and Tumor Classification
    Abdi, Mohammad Javad
    Hosseini, Seyed Mohammad
    Rezghi, Mansoor
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2012, 2012
  • [3] A novel gene selection algorithm for cancer classification using microarray datasets
    Alanni, Russul
    Hou, Jingyu
    Azzawi, Hasseeb
    Xiang, Yong
    [J]. BMC MEDICAL GENOMICS, 2019, 12 (1)
  • [4] Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms
    Alba, Enrique
    Garcia-Nieto, Jose
    Jourdan, Laetitia
    Talbi, El-Ghazali
    [J]. 2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, : 284 - +
  • [5] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [6] Almugren N., 2019, 2019 IEEE C COMPUTAT, P1, DOI DOI 10.1109/cais.2019.8769532
  • [7] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [8] Alshamlan H., 2014, P 1 INT C ADV DAT IN, P389, DOI DOI 10.1007/978-981-4585-18-7_44
  • [9] Alshamlan Hala M., 2016, International Journal of Machine Learning and Computing, V6, P184, DOI 10.18178/ijmlc.2016.6.3.596
  • [10] ALSHAMLAN H.M., 2013, Proceedings of the World Congress on Engineering, P1