A Stable Gene Subset Selection Algorithm for Cancers

被引:3
作者
Xie, Juanying [1 ]
Gao, Hongchao [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Xian 710062, Peoples R China
来源
HEALTH INFORMATION SCIENCE (HIS 2015) | 2015年 / 9085卷
关键词
Gene selection; Gene subsets; K-means; Assemble; Pearson correlation coefficient; Cancers; CLASSIFICATION;
D O I
10.1007/978-3-319-19156-0_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to solve the problem that the selected genes are depend on the train subset in the gene subset selection algorithms, we propose an assemble method to select the discrimination genes for cancers, so that a stable gene subset can be obtained. We randomly extract some proportional samples from train subset and cluster the genes of these samples in K-means, then select a typical gene from each cluster according to its weight estimated in Pearson correlation coefficient between genes and labels. This process is repeated several times. Those genes with high frequencies in the processes are selected to construct the selected gene subset. The power of the proposed method is tested on three very popular gene datasets, and the experimental results demonstrate that the new algorithm proposed in this paper has found the most stable gene subset with the highest classification accuracy.
引用
收藏
页码:111 / 122
页数:12
相关论文
共 19 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], APPL COMPUTING INFOR
[3]   A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets [J].
Bermejo, Pablo ;
Gamez, Jose A. ;
Puerta, Jose M. .
PATTERN RECOGNITION LETTERS, 2011, 32 (05) :701-711
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[6]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528
[7]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[8]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[9]  
Guyon Isabelle, 2003, Journal of Machine Learning, V3, P1157
[10]  
Han J., 2006, Data Mining: Concepts and Techniques, V2nd ed.