Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis

被引:138
|
作者
Tang, Yuchun
Zhang, Yan-Qing
Huang, Zhen
机构
[1] Secure Comp Corp, Alpharetta, GA 30022 USA
[2] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30302 USA
[3] Georgia State Univ, Dept Chem, Atlanta, GA 30303 USA
[4] Georgia State Univ, Dept Biol, Atlanta, GA 30303 USA
关键词
bioinformatics; microarray gene expression data analysis; cancer classification; support vector machines; gene selection; feature selection; recursive feature elimination;
D O I
10.1109/TCBB.2007.1028
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Extracting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function analyses. Though many algorithms have been developed, the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithm is one of the best gene feature selection algorithms. It assumes that a smaller "filter-out" factor in the SVM-RFE, which results in a smaller number of gene features eliminated in each recursion, should lead to extraction of a better gene subset. Because the SVM-RFE is highly sensitive to the "filter-out" factor, our simulations have shown that this assumption is not always correct and that the SVM-RFE is an unstable algorithm. To select a set of key gene features for reliable prediction of cancer types or subtypes and other applications, a new two-stage SVM-RFE algorithm has been developed. It is designed to effectively eliminate most of the irrelevant, redundant, and noisy genes while keeping information loss small at the first stage. A fine selection for the final gene subset is then performed at the second stage. The two-stage SVM-RFE overcomes the instability problem of the SVM-RFE to achieve better algorithm utility. We have demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM-RFE and three correlation-based methods based on our analysis of three publicly available microarray expression data sets. Furthermore, the two-stage SVM-RFE is computationally efficient because its time complexity is O(d*log(2)d), where d is the size of the original gene set.
引用
收藏
页码:365 / 381
页数:17
相关论文
共 50 条
  • [1] Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis
    Secure Computing Corporation, 4800 North Point Parkway, Alpharetta, GA 30022
    不详
    不详
    IEEE/ACM Trans. Comput. BioL. Bioinf., 2007, 3 (365-381):
  • [2] A Modified Two-Stage SVM-RFE Model for Cancer Classification Using Microarray Data
    Tan, Phit Ling
    Tan, Shing Chiang
    Lim, Chee Peng
    Khor, Swee Eng
    NEURAL INFORMATION PROCESSING, PT I, 2011, 7062 : 668 - +
  • [3] MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data
    Zhou, Xin
    Tuck, David P.
    BIOINFORMATICS, 2007, 23 (09) : 1106 - 1114
  • [4] Multiple SVM-RFE for gene selection in cancer classification with expression data
    Duan, KB
    Rajapakse, JC
    Wang, HY
    Azuaje, F
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2005, 4 (03) : 228 - 234
  • [5] A variant of SVM-RFE for gene selection in cancer classification with expression data
    Duan, KB
    Rajapakse, JC
    PROCEEDINGS OF THE 2004 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2004, : 49 - 55
  • [6] Granular SVM-RFE gene selection algorithm for reliable prostate cancer classification on microarray expression data
    Tang, YC
    Zhang, YQ
    Huang, Z
    Hu, XH
    BIBE 2005: 5TH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, 2005, : 290 - 293
  • [7] Multiple SVM-RFE for Multi-class Gene Selection on DNA Microarray Data
    Zhang, Li
    Huang, Xiaojuan
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [8] A Novel SVM-RFE for Gene Selection
    Tan, Jun-Yan
    Yang, Zhi-Xia
    Deng, Naiyang
    OPTIMIZATION AND SYSTEMS BIOLOGY, 2009, 11 : 237 - +
  • [9] A Hybrid Feature Selection Based on Fisher Score and SVM-RFE for Microarray Data
    Hamla H.
    Ghanem K.
    Informatica (Slovenia), 2024, 48 (01): : 57 - 68
  • [10] SVM-RFE With MRMR Filter for Gene Selection
    Mundra, Piyushkumar A.
    Rajapakse, Jagath C.
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2010, 9 (01) : 31 - 37