Feature clustering based support vector machine recursive feature elimination for gene selection

被引：104

作者：

Huang, Xiaojuan ^{[1
,2
]}

Zhang, Li ^{[1
,2
]}

Wang, Bangjun ^{[1
,2
]}

Li, Fanzhang ^{[1
,2
]}

Zhang, Zhao ^{[1
,2
]}

机构：

[1] Soochow Univ Suzhou, Sch Comp Sci & Technol, Suzhou, Peoples R China

[2] Soochow Univ Suzhou, Joint Int Res Lab Machine Learning & Neuromorph C, Suzhou, Peoples R China

来源：

APPLIED INTELLIGENCE | 2018年 / 48卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Support vector machine; Feature selection; Gene clustering; Recursive feature elimination; Gene relevancy; Gene redundancy; SVM-RFE; CANCER CLASSIFICATION; EXPRESSION DATA; DISCOVERY; RELEVANCE; FILTER;

D O I：

10.1007/s10489-017-0992-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a DNA microarray dataset, gene expression data often has a huge number of features(which are referred to as genes) versus a small size of samples. With the development of DNA microarray technology, the number of dimensions increases even faster than before, which could lead to the problem of the curse of dimensionality. To get good classification performance, it is necessary to preprocess the gene expression data. Support vector machine recursive feature elimination (SVM-RFE) is a classical method for gene selection. However, SVM-RFE suffers from high computational complexity. To remedy it, this paper enhances SVM-RFE for gene selection by incorporating feature clustering, called feature clustering SVM-RFE (FCSVM-RFE). The proposed method first performs gene selection roughly and then ranks the selected genes. First, a clustering algorithm is used to cluster genes into gene groups, in each which genes have similar expression profile. Then, a representative gene is found to represent a gene group. By doing so, we can obtain a representative gene set. Then, SVM-RFE is applied to rank these representative genes. FCSVM-RFE can reduce the computational complexity and the redundancy among genes. Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM-RFE.

引用

页码：594 / 607

页数：14

共 44 条

[1] Selection bias in gene extraction on the basis of microarray gene-expression data [J].