Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis

被引:138
|
作者
Tang, Yuchun
Zhang, Yan-Qing
Huang, Zhen
机构
[1] Secure Comp Corp, Alpharetta, GA 30022 USA
[2] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30302 USA
[3] Georgia State Univ, Dept Chem, Atlanta, GA 30303 USA
[4] Georgia State Univ, Dept Biol, Atlanta, GA 30303 USA
关键词
bioinformatics; microarray gene expression data analysis; cancer classification; support vector machines; gene selection; feature selection; recursive feature elimination;
D O I
10.1109/TCBB.2007.1028
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Extracting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function analyses. Though many algorithms have been developed, the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithm is one of the best gene feature selection algorithms. It assumes that a smaller "filter-out" factor in the SVM-RFE, which results in a smaller number of gene features eliminated in each recursion, should lead to extraction of a better gene subset. Because the SVM-RFE is highly sensitive to the "filter-out" factor, our simulations have shown that this assumption is not always correct and that the SVM-RFE is an unstable algorithm. To select a set of key gene features for reliable prediction of cancer types or subtypes and other applications, a new two-stage SVM-RFE algorithm has been developed. It is designed to effectively eliminate most of the irrelevant, redundant, and noisy genes while keeping information loss small at the first stage. A fine selection for the final gene subset is then performed at the second stage. The two-stage SVM-RFE overcomes the instability problem of the SVM-RFE to achieve better algorithm utility. We have demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM-RFE and three correlation-based methods based on our analysis of three publicly available microarray expression data sets. Furthermore, the two-stage SVM-RFE is computationally efficient because its time complexity is O(d*log(2)d), where d is the size of the original gene set.
引用
收藏
页码:365 / 381
页数:17
相关论文
共 50 条
  • [31] A two-stage hybrid approach for feature selection in microarray analysis
    Lee, Chung-Hong
    Yang, Hsin-Chang
    Wu, Chih-Hong
    Lan, Yi-Chia
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 188 - +
  • [32] Feature selection and analysis of single lateral damper fault based on SVM-RFE with correlation bias reduction
    Tang Daochao
    Jin Weidong
    Qin Na
    Li Hui
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 3830 - 3835
  • [33] Support Vector Machine - Recursive Feature Elimination (SVM-RFE) for Selection of MicroRNA Expression Features of Breast Cancer
    Adorada, Amazona
    Permatasari, Ratih
    Wirawan, Panji Wisnu
    Wibowo, Adi
    Sujiwo, Adi
    2018 2ND INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS), 2018, : 165 - 168
  • [34] Two-stage classification methods for microarray data
    Wong, Tzu-Tsung
    Hsu, Ching-Han
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (01) : 375 - 383
  • [35] A novel gene expression test method of minimizing breast cancer risk in reduced cost and time by improving SVM-RFE gene selection method combined with LASSO
    Gupta, Madhuri
    Gupta, Bharat
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2021, 18 (02) : 139 - 153
  • [36] A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification
    Zakariya Yahya Algamal
    Muhammad Hisyam Lee
    Advances in Data Analysis and Classification, 2019, 13 : 753 - 771
  • [37] A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification
    Algamal, Zakariya Yahya
    Lee, Muhammad Hisyam
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (03) : 753 - 771
  • [38] One-versus-one and one-versus-all multiclass SVM-RFE for gene selection in cancer classification
    Duan, Kai-Bo
    Rajapakse, Jagath C.
    Nguyen, Minh N.
    EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS, PROCEEDINGS, 2007, 4447 : 47 - +
  • [39] An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce
    Madan, Suman
    Komalavalli, C.
    Bhatia, Manjot Kaur
    Laroiya, Chetna
    Arora, Monika
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 74233 - 74254
  • [40] Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm
    Qin, Xiwen
    Zhang, Shuang
    Yin, Dongmei
    Chen, Dongxue
    Dong, Xiaogang
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (12) : 13747 - 13781