Improving reliability of gene selection from microarray functional genomics data

被引:17
作者
Fu, LM [1 ]
Youn, ES
机构
[1] Univ Florida, Gainesville, FL 32611 USA
[2] Pacific TB & Canc Res Org, Los Angeles, CA USA
[3] Univ Florida, Gainesville, FL 32611 USA
来源
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE | 2003年 / 7卷 / 03期
基金
美国国家科学基金会;
关键词
bootstrap; functional genomics; gene expression; gene selection; microarray; support vector machine (SVM);
D O I
10.1109/TITB.2003.816558
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Constructing a classifier based on microarray gene expression data has recently emerged as an important problem for cancer classification. Recent results have suggested the feasibility of constructing such a classifier with reasonable predictive accuracy under the circumstance where only a small number of cancer tissue samples of known type are available. Difficulty arises from the fact that each sample contains the expression data of a vast, number of genes and these genes may interact with one another. Selection of a small number of critical genes is fundamental to correctly analyze the otherwise overwhelming data. It is essential to use a multivariate approach for capturing the correlated structure in the data. However, the curse of dimensionality leads to the concern about the reliability of selected genes. Here, we present a new gene selection method in which error and repeatability of selected genes are assessed within the context of M-fold cross-validation. In particular, we show that the method is able to identify source variables underlying data generation.
引用
收藏
页码:191 / 196
页数:6
相关论文
共 25 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] Selection bias in gene extraction on the basis of microarray gene-expression data
    Ambroise, C
    McLachlan, GJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
  • [3] BAXEVANIS AD, 2001, BIOINFORMATICS
  • [4] Selection of relevant features and examples in machine learning
    Blum, AL
    Langley, P
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 245 - 271
  • [5] Bo TH, 2002, GENOME BIOL, V3
  • [6] High-throughput tissue microarray analysis used to evaluate biology and prognostic significance of the E-cadherin pathway in non-small-cell lung cancer
    Bremnes, RM
    Veve, R
    Gabrielson, E
    Hirsch, FR
    Baron, A
    Bemis, L
    Gemmill, RM
    Drabkin, HA
    Franklin, WA
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2002, 20 (10) : 2417 - 2428
  • [7] Brooks James D, 2002, Curr Opin Urol, V12, P395, DOI 10.1097/00042307-200209000-00005
  • [8] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [9] Gene expression microarray analysis in cancer biology, pharmacology, and drug development: progress and potential
    Clarke, PA
    Poele, RT
    Wooster, R
    Workman, P
    [J]. BIOCHEMICAL PHARMACOLOGY, 2001, 62 (10) : 1311 - 1336
  • [10] Applications of microarray technology in breast cancer research
    Cooper, CS
    [J]. BREAST CANCER RESEARCH, 2001, 3 (03) : 158 - 175