Feature clustering based support vector machine recursive feature elimination for gene selection

被引:104
作者
Huang, Xiaojuan [1 ,2 ]
Zhang, Li [1 ,2 ]
Wang, Bangjun [1 ,2 ]
Li, Fanzhang [1 ,2 ]
Zhang, Zhao [1 ,2 ]
机构
[1] Soochow Univ Suzhou, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Soochow Univ Suzhou, Joint Int Res Lab Machine Learning & Neuromorph C, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Support vector machine; Feature selection; Gene clustering; Recursive feature elimination; Gene relevancy; Gene redundancy; SVM-RFE; CANCER CLASSIFICATION; EXPRESSION DATA; DISCOVERY; RELEVANCE; FILTER;
D O I
10.1007/s10489-017-0992-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a DNA microarray dataset, gene expression data often has a huge number of features(which are referred to as genes) versus a small size of samples. With the development of DNA microarray technology, the number of dimensions increases even faster than before, which could lead to the problem of the curse of dimensionality. To get good classification performance, it is necessary to preprocess the gene expression data. Support vector machine recursive feature elimination (SVM-RFE) is a classical method for gene selection. However, SVM-RFE suffers from high computational complexity. To remedy it, this paper enhances SVM-RFE for gene selection by incorporating feature clustering, called feature clustering SVM-RFE (FCSVM-RFE). The proposed method first performs gene selection roughly and then ranks the selected genes. First, a clustering algorithm is used to cluster genes into gene groups, in each which genes have similar expression profile. Then, a representative gene is found to represent a gene group. By doing so, we can obtain a representative gene set. Then, SVM-RFE is applied to rank these representative genes. FCSVM-RFE can reduce the computational complexity and the redundancy among genes. Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM-RFE.
引用
收藏
页码:594 / 607
页数:14
相关论文
共 44 条
[1]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[2]  
[Anonymous], STATISTICAL LEARNING
[3]  
[Anonymous], 2009, P WORLD C ENG
[4]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[5]   Predictive Ensemble Pruning by Expectation Propagation [J].
Chen, Huanhuan ;
Tino, Peter ;
Yao, Xin .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (07) :999-1013
[6]   Biomarker discovery in microarray gene expression data with Gaussian processes [J].
Chu, W ;
Ghahramani, Z ;
Falciani, F ;
Wild, DL .
BIOINFORMATICS, 2005, 21 (16) :3385-3393
[7]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[8]   Delineation of prognostic biomarkers in prostate cancer [J].
Dhanasekaran, SM ;
Barrette, TR ;
Ghosh, D ;
Shah, R ;
Varambally, S ;
Kurachi, K ;
Pienta, KJ ;
Rubin, MA ;
Chinnaiyan, AM .
NATURE, 2001, 412 (6849) :822-826
[9]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]  
Ding Chris, 2005, Journal of Bioinformatics and Computational Biology, V3, P185, DOI 10.1142/S0219720005001004