Use of SVM-based ensemble feature selection method for gene expression data analysis

被引:0
作者
Zhang, Shizhi [1 ]
Zhang, Mingjin [1 ]
机构
[1] Qinghai Minzu Univ, Sch Chem & Chem Engn, Xining 810007, Peoples R China
关键词
ensemble feature selection; gene expression data; support vector machine; CANCER; IDENTIFICATION; CLASSIFICATION; DISCOVERY; PATTERNS;
D O I
10.1515/sagmb-2022-0002
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Gene selection is one of the key steps for gene expression data analysis. An SVM-based ensemble feature selection method is proposed in this paper. Firstly, the method builds many subsets by using Monte Carlo sampling. Secondly, ranking all the features on each of the subsets and integrating them to obtain a final ranking list. Finally, the optimum feature set is determined by a backward feature elimination strategy. This method is applied to the analysis of 4 public datasets: the Leukemia, Prostate, Colorectal, and SMK_CAN, resulting 7, 10, 13, and 32 features. The AUC obtained from independent test sets are 0.9867, 0.9796, 0.9571, and 0.9575, respectively. These results indicate that the features selected by the proposed method can improve sample classification accuracy, and thus be effective for gene selection from gene expression data.
引用
收藏
页数:10
相关论文
共 37 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], 2011, ACM T INTELLIGENT SY, DOI DOI 10.1145/1961189.1961199
[4]   Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer [J].
Bhalla, Sherry ;
Chaudhary, Kumardeep ;
Kumar, Ritesh ;
Sehgal, Manika ;
Kaur, Harpreet ;
Sharma, Suresh ;
Raghava, Gajendra P. S. .
SCIENTIFIC REPORTS, 2017, 7
[5]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
[6]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[7]   WERFE: A Gene Selection Algorithm Based on Recursive Feature Elimination and Ensemble Strategy [J].
Chen, Qi ;
Meng, Zhaopeng ;
Su, Ran .
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
[8]   Improving Cancer Classification Accuracy Using Gene Pairs [J].
Chopra, Pankaj ;
Lee, Jinseung ;
Kang, Jaewoo ;
Lee, Sunwon .
PLOS ONE, 2010, 5 (12)
[9]   Multiple pathways of cell invasion are regulated by multiple families of serine proteases [J].
Del Rosso, M ;
Fibbi, G ;
Pucci, M ;
D'Alessio, S ;
Del Rosso, A ;
Magnelli, L ;
Chiarugi, V .
CLINICAL & EXPERIMENTAL METASTASIS, 2002, 19 (03) :193-207
[10]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15