Gene selection from microarray data for cancer classification - a machine learning approach

被引:248
作者
Wang, Y
Tetko, IV
Hall, MA
Frank, E
Facius, A
Mayer, KFX
Mewes, HW
机构
[1] German Res Ctr Environm & Hlth, Inst Bioinformat, D-85764 Neuherberg, Germany
[2] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
[3] Tech Univ Munich, Wissenschaftszentrum Weihenstephan, Dept Genome Oriented Bioinformat, D-85354 Freising Weihenstephan, Germany
关键词
microarray; gene selection; machine learning; cancer classification; feature selection;
D O I
10.1016/j.compbiolchem.2004.11.001
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, naive Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:37 / 46
页数:10
相关论文
共 38 条
[21]   Discovery of significant rules for classifying cancer diagnosis data [J].
Li, Jinyan ;
Liu, Huiqing ;
Ng, See-Kiong ;
Wong, Limsoon .
BIOINFORMATICS, 2003, 19 :II93-II102
[22]   Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns [J].
Li, JY ;
Wong, LS .
BIOINFORMATICS, 2002, 18 (05) :725-734
[23]  
Li W., 2002, Methods of Microarray Data Analysis, P137
[24]  
Press W., 1993, Numerical recipes, V2nd
[25]   Multiclass cancer diagnosis using tumor gene expression signatures [J].
Ramaswamy, S ;
Tamayo, P ;
Rifkin, R ;
Mukherjee, S ;
Yeang, CH ;
Angelo, M ;
Ladd, C ;
Reich, M ;
Latulippe, E ;
Mesirov, JP ;
Poggio, T ;
Gerald, W ;
Loda, M ;
Lander, ES ;
Golub, TR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (26) :15149-15154
[26]   p130(CAS) forms a signaling complex with the adapter protein CRKL in hematopoietic cells transformed by the BCR/ABL oncogene [J].
Salgia, R ;
Pisick, E ;
Sattler, M ;
Li, JAL ;
Uemura, N ;
Wong, WK ;
Burky, SA ;
Hirai, H ;
Chen, LP ;
Griffin, JD .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1996, 271 (41) :25198-25203
[27]   RankGene: identification of diagnostic genes based on expression data [J].
Su, Y ;
Murali, TM ;
Pavlovic, V ;
Schaffer, M ;
Kasif, S .
BIOINFORMATICS, 2003, 19 (12) :1578-1579
[28]   Restoration of C/EBPα expression in a BCR-ABL+ cell line induces terminal granulocytic differentiation [J].
Tavor, S ;
Park, DJ ;
Gery, S ;
Vuong, PT ;
Gombart, AF ;
Koeffler, HP .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2003, 278 (52) :52651-52659
[29]   An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles [J].
Thomas, JG ;
Olson, JM ;
Tapscott, SJ ;
Zhao, LP .
GENOME RESEARCH, 2001, 11 (07) :1227-1236
[30]   Testing for differentially expressed genes with microarray data - art. no. 52 [J].
Tsai, CA ;
Chen, YJ ;
Chen, JJ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (09) :e52