Improving reliability of gene selection from microarray functional genomics data

被引:17
作者
Fu, LM [1 ]
Youn, ES
机构
[1] Univ Florida, Gainesville, FL 32611 USA
[2] Pacific TB & Canc Res Org, Los Angeles, CA USA
[3] Univ Florida, Gainesville, FL 32611 USA
来源
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE | 2003年 / 7卷 / 03期
基金
美国国家科学基金会;
关键词
bootstrap; functional genomics; gene expression; gene selection; microarray; support vector machine (SVM);
D O I
10.1109/TITB.2003.816558
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Constructing a classifier based on microarray gene expression data has recently emerged as an important problem for cancer classification. Recent results have suggested the feasibility of constructing such a classifier with reasonable predictive accuracy under the circumstance where only a small number of cancer tissue samples of known type are available. Difficulty arises from the fact that each sample contains the expression data of a vast, number of genes and these genes may interact with one another. Selection of a small number of critical genes is fundamental to correctly analyze the otherwise overwhelming data. It is essential to use a multivariate approach for capturing the correlated structure in the data. However, the curse of dimensionality leads to the concern about the reliability of selected genes. Here, we present a new gene selection method in which error and repeatability of selected genes are assessed within the context of M-fold cross-validation. In particular, we show that the method is able to identify source variables underlying data generation.
引用
收藏
页码:191 / 196
页数:6
相关论文
共 25 条
  • [11] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
  • [12] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [13] Fu L, 1994, NEURAL NETWORKS COMP
  • [14] Support vector machine classification and validation of cancer tissue samples using microarray expression data
    Furey, TS
    Cristianini, N
    Duffy, N
    Bednarski, DW
    Schummer, M
    Haussler, D
    [J]. BIOINFORMATICS, 2000, 16 (10) : 906 - 914
  • [15] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [16] Gene selection for cancer classification using support vector machines
    Guyon, I
    Weston, J
    Barnhill, S
    Vapnik, V
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 389 - 422
  • [17] Clinical research comes under scrutiny
    Habeck, M
    [J]. LANCET ONCOLOGY, 2001, 2 (10) : 588 - 588
  • [18] Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments
    Kerr, MK
    Churchill, GA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (16) : 8961 - 8965
  • [19] Gene selection: a Bayesian variable selection approach
    Lee, KE
    Sha, NJ
    Dougherty, ER
    Vannucci, M
    Mallick, BK
    [J]. BIOINFORMATICS, 2003, 19 (01) : 90 - 97
  • [20] Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method
    Li, LP
    Weinberg, CR
    Darden, TA
    Pedersen, LG
    [J]. BIOINFORMATICS, 2001, 17 (12) : 1131 - 1142