GOing Bayesian: model-based gene set analysis of genome-scale data

被引:111
作者
Bauer, Sebastian [1 ]
Gagneur, Julien [2 ]
Robinson, Peter N. [1 ,3 ,4 ]
机构
[1] Charite, Inst Med Genet, D-13353 Berlin, Germany
[2] European Mol Biol Lab, D-69117 Heidelberg, Germany
[3] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
[4] Charite, Berlin Brandenburg Ctr Regenerat Therapies BCRT, D-13353 Berlin, Germany
关键词
ENRICHMENT ANALYSIS; ONTOLOGY ANNOTATIONS; TERM ENRICHMENT; EXPRESSION DATA; TRANSCRIPTION;
D O I
10.1093/nar/gkq045
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Here we present model-based gene set analysis (MGSA) that analyzes all categories at once by embedding them in a Bayesian network, in which gene response is modeled as a function of the activation of biological categories. Probabilistic inference is used to identify the active categories. The Bayesian modeling approach naturally takes category overlap into account and avoids the need for multiple testing corrections met in single-category enrichment analysis. On simulated data, MGSA identifies active categories with up to 95% precision at a recall of 20% for moderate settings of noise, leading to a 10-fold precision improvement over single-category statistical enrichment analysis. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations.
引用
收藏
页码:3523 / 3532
页数:10
相关论文
共 28 条
  • [1] Improved scoring of functional groups from gene expression data by decorrelating GO graph structure
    Alexa, Adrian
    Rahnenfuehrer, Joerg
    Lengauer, Thomas
    [J]. BIOINFORMATICS, 2006, 22 (13) : 1600 - 1607
  • [2] An introduction to MCMC for machine learning
    Andrieu, C
    de Freitas, N
    Doucet, A
    Jordan, MI
    [J]. MACHINE LEARNING, 2003, 50 (1-2) : 5 - 43
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] The GOA database in 2009-an integrated Gene Ontology Annotation resource
    Barrell, Daniel
    Dimmer, Emily
    Huntley, Rachael P.
    Binns, David
    O'Donovan, Claire
    Apweiler, Rolf
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D396 - D403
  • [5] Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration
    Bauer, Sebastian
    Grossmann, Steffen
    Vingron, Martin
    Robinson, Peter N.
    [J]. BIOINFORMATICS, 2008, 24 (14) : 1650 - 1651
  • [6] A high-resolution map of transcription in the yeast genome
    David, L
    Huber, W
    Granovskaia, M
    Toedling, J
    Palm, CJ
    Bofkin, L
    Jones, T
    Davis, RW
    Steinmetz, LM
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (14) : 5320 - 5325
  • [7] Diaconis P., 1995, Proceedings of the Twenty-Seventh Annual ACM Symposium on the Theory of Computing, P112, DOI 10.1145/225058.225095
  • [8] Diaconis P, 2009, B AM MATH SOC, V46, P179
  • [9] Analyzing gene expression data in terms of gene sets:: methodological issues
    Goeman, Jelle J.
    Buehlmann, Peter
    [J]. BIOINFORMATICS, 2007, 23 (08) : 980 - 987
  • [10] The pathophysiology of mitochondrial cell death
    Green, DR
    Kroemer, G
    [J]. SCIENCE, 2004, 305 (5684) : 626 - 629