Stable Gene Selection from Microarray Data via Sample Weighting

被引:75
作者
Yu, Lei [1 ]
Han, Yue [1 ]
Berens, Michael E. [2 ]
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13902 USA
[2] Translat Genom Res Inst, Canc & Cell Biol Div, Phoenix, AZ 85004 USA
基金
美国国家科学基金会;
关键词
Feature selection; gene selection; stability; classification; gene expression microarray; SVM-RFE; CANCER CLASSIFICATION; TISSUE CLASSIFICATION; STABILITY; MULTICLASS; PATTERNS;
D O I
10.1109/TCBB.2011.47
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Feature selection from gene expression microarray data is a widely used technique for selecting candidate genes in various cancer studies. Besides predictive ability of the selected genes, an important aspect in evaluating a selection method is the stability of the selected genes. Experts instinctively have high confidence in the result of a selection method that selects similar sets of genes under some variations to the samples. However, a common problem of existing feature selection methods for gene expression data is that the selected genes by the same method often vary significantly with sample variations. In this work, we propose a general framework of sample weighting to improve the stability of feature selection methods under sample variations. The framework first weights each sample in a given training set according to its influence to the estimation of feature relevance, and then provides the weighted training set to a feature selection method. We also develop an efficient margin-based sample weighting algorithm under this framework. Experiments on a set of microarray data sets show that the proposed algorithm significantly improves the stability of representative feature selection algorithms such as SVM-RFE and ReliefF, without sacrificing their classification performance. Moreover, the proposed algorithm also leads to more stable gene signatures than the state-of-the-art ensemble method, particularly for small signature sizes.
引用
收藏
页码:262 / 272
页数:11
相关论文
共 39 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Stability and aggregation of ranked gene lists [J].
Boulesteix, Anne-Laure ;
Slawski, Martin .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) :556-568
[4]   Characterization of single-nucleotide polymorphisms in coding regions of human genes [J].
Cargill, M ;
Altshuler, D ;
Ireland, J ;
Sklar, P ;
Ardlie, K ;
Patil, N ;
Lane, CR ;
Lim, EP ;
Kalyanaraman, N ;
Nemesh, J ;
Ziaugra, L ;
Friedland, L ;
Rolfe, A ;
Warrington, J ;
Lipshutz, R ;
Daley, GQ ;
Lander, ES .
NATURE GENETICS, 1999, 22 (03) :231-238
[5]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[6]  
Crammer K., 2003, Proceedings of the 17th Conference on Neural Information Processing Systems, V15, P462
[7]   Reliable gene signatures for microarray classification:: assessment of stability and performance [J].
Davis, Chad A. ;
Gerick, Fabian ;
Hintermair, Volker ;
Friedel, Caroline C. ;
Fundel, Katrin ;
Kueffner, Robert ;
Zimmer, Ralf .
BIOINFORMATICS, 2006, 22 (19) :2356-2363
[8]   Multiple SVM-RFE for gene selection in cancer classification with expression data [J].
Duan, KB ;
Rajapakse, JC ;
Wang, HY ;
Azuaje, F .
IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2005, 4 (03) :228-234
[9]   On consensus biomarker selection [J].
Dutkowski, Janusz ;
Gambin, Anna .
BMC BIOINFORMATICS, 2007, 8 (Suppl 5)
[10]   Outcome signature genes in breast cancer: is there a unique set? [J].
Ein-Dor, L ;
Kela, I ;
Getz, G ;
Givol, D ;
Domany, E .
BIOINFORMATICS, 2005, 21 (02) :171-178