Gene selection from microarray data for cancer classification - a machine learning approach

被引:248
作者
Wang, Y
Tetko, IV
Hall, MA
Frank, E
Facius, A
Mayer, KFX
Mewes, HW
机构
[1] German Res Ctr Environm & Hlth, Inst Bioinformat, D-85764 Neuherberg, Germany
[2] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
[3] Tech Univ Munich, Wissenschaftszentrum Weihenstephan, Dept Genome Oriented Bioinformat, D-85354 Freising Weihenstephan, Germany
关键词
microarray; gene selection; machine learning; cancer classification; feature selection;
D O I
10.1016/j.compbiolchem.2004.11.001
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, naive Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:37 / 46
页数:10
相关论文
共 38 条
[1]  
Agathanggelou A, 2003, CANCER RES, V63, P5344
[2]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]  
[Anonymous], PROC FALL SYMP REL
[5]  
[Anonymous], 1990, SUPPORT VECTOR LEARN
[6]  
[Anonymous], 1993, P 13 INT JOINT C ART
[7]   Effective dimension reduction methods for tumor classification using gene expression data [J].
Antoniadis, A ;
Lambert-Lacroix, S ;
Leblanc, F .
BIOINFORMATICS, 2003, 19 (05) :563-570
[8]   Optimization models for cancer classification: extracting gene interaction information from microarray expression data [J].
Antonov, AV ;
Tetko, IV ;
Mader, MT ;
Budczies, J ;
Mewes, HW .
BIOINFORMATICS, 2004, 20 (05) :644-U145
[9]  
CRAWFORD AW, 1991, J BIOL CHEM, V266, P5847
[10]   Data mining in bioinformatics using Weka [J].
Frank, E ;
Hall, M ;
Trigg, L ;
Holmes, G ;
Witten, IH .
BIOINFORMATICS, 2004, 20 (15) :2479-2481