A primer on gene expression and microarrays for machine learning researchers

被引:26
作者
Kuo, WP [1 ]
Kim, EY
Trimarchi, J
Jenssen, TK
Vinterbo, SA
Ohno-Machado, L
机构
[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Decis Syst Grp, Boston, MA 02115 USA
[2] Harvard Univ, Div Hlth Sci & Technol, Cambridge, MA 02138 USA
[3] MIT, Cambridge, MA USA
[4] Harvard Univ, Sch Med, Dept Genet, Boston, MA USA
[5] Harvard Univ, Sch Dent Med, Dept Oral Med Infect & Immun, Boston, MA 02115 USA
[6] PubGene Inc, Oslo, Norway
关键词
bioinformatics; microarrays; machine learning;
D O I
10.1016/j.jbi.2004.07.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data originating from biomedical experiments has provided machine learning researchers with an important source of motivation for developing and evaluating new algorithms. A new wave of algorithmic development has been initiated with the publication of gene expression data derived from microarrays. Microarray data analysis is particularly challenging given the large number of measurements (typically in the order of thousands) that are reported for relatively few samples (typically in the order of dozens). Many data sets are now available on the web. It is important that machine learning researchers understand how data are obtained and which assumptions are necessary in the analysis. Microarray data have the potential to cause significant impact in machine learning research, not just as a rich and realistic source of cases for testing new algorithms, as has been the UCI machine learning repository in the past decades, but also as a main motivation for their development. In this article, we briefly review the biology underlying microarrays, the process of obtaining gene expression measurements, and the rationale behind the common types of analyses involved in a microarray experiment. We outline the main challenges and reiterate critical considerations regarding the construction of supervised learning models that use this type of data. The goal of this article is to familiarize machine learning researchers with data originated from gene expression microarrays. (C) 2004 Elevier Inc. All rights reserved.
引用
收藏
页码:293 / 303
页数:11
相关论文
共 67 条
  • [1] COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT
    ADAMS, MD
    KELLEY, JM
    GOCAYNE, JD
    DUBNICK, M
    POLYMEROPOULOS, MH
    XIAO, H
    MERRIL, CR
    WU, A
    OLDE, B
    MORENO, RF
    KERLAVAGE, AR
    MCCOMBIE, WR
    VENTER, JC
    [J]. SCIENCE, 1991, 252 (5013) : 1651 - 1656
  • [2] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [3] Alwine J C, 1979, Methods Enzymol, V68, P220
  • [4] Gene expression during the life cycle of Drosophila melanogaster
    Arbeitman, MN
    Furlong, EEM
    Imam, F
    Johnson, E
    Null, BH
    Baker, BS
    Krasnow, MA
    Scott, MP
    Davis, RW
    White, KP
    [J]. SCIENCE, 2002, 297 (5590) : 2270 - 2275
  • [5] Identifying differentially expressed genes in cDNA microarray experiments
    Baggerly, KA
    Coombes, KR
    Hess, KR
    Stivers, DN
    Abruzzo, LV
    Zhang, W
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (06) : 639 - 659
  • [6] The Molecular Biology Database Collection: 2002 update
    Baxevanis, AD
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 1 - 12
  • [7] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
  • [8] Genomic analysis of mouse retinal development
    Blackshaw, S
    Harpavat, S
    Trimarchi, J
    Cai, L
    Huang, HY
    Kuo, WP
    Weber, G
    Lee, K
    Fraioli, RE
    Cho, SH
    Yung, R
    Asch, E
    Ohno-Machado, L
    Wong, WH
    Cepko, CL
    [J]. PLOS BIOLOGY, 2004, 2 (09) : 1411 - 1431
  • [9] Is cross-validation valid for small-sample microarray classification?
    Braga-Neto, UM
    Dougherty, ER
    [J]. BIOINFORMATICS, 2004, 20 (03) : 374 - 380
  • [10] Minimum information about a microarray experiment (MIAME) - toward standards for microarray data
    Brazma, A
    Hingamp, P
    Quackenbush, J
    Sherlock, G
    Spellman, P
    Stoeckert, C
    Aach, J
    Ansorge, W
    Ball, CA
    Causton, HC
    Gaasterland, T
    Glenisson, P
    Holstege, FCP
    Kim, IF
    Markowitz, V
    Matese, JC
    Parkinson, H
    Robinson, A
    Sarkans, U
    Schulze-Kremer, S
    Stewart, J
    Taylor, R
    Vilo, J
    Vingron, M
    [J]. NATURE GENETICS, 2001, 29 (04) : 365 - 371