GOing Bayesian: model-based gene set analysis of genome-scale data

被引:111
作者
Bauer, Sebastian [1 ]
Gagneur, Julien [2 ]
Robinson, Peter N. [1 ,3 ,4 ]
机构
[1] Charite, Inst Med Genet, D-13353 Berlin, Germany
[2] European Mol Biol Lab, D-69117 Heidelberg, Germany
[3] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
[4] Charite, Berlin Brandenburg Ctr Regenerat Therapies BCRT, D-13353 Berlin, Germany
关键词
ENRICHMENT ANALYSIS; ONTOLOGY ANNOTATIONS; TERM ENRICHMENT; EXPRESSION DATA; TRANSCRIPTION;
D O I
10.1093/nar/gkq045
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Here we present model-based gene set analysis (MGSA) that analyzes all categories at once by embedding them in a Bayesian network, in which gene response is modeled as a function of the activation of biological categories. Probabilistic inference is used to identify the active categories. The Bayesian modeling approach naturally takes category overlap into account and avoids the need for multiple testing corrections met in single-category enrichment analysis. On simulated data, MGSA identifies active categories with up to 95% precision at a recall of 20% for moderate settings of noise, leading to a 10-fold precision improvement over single-category statistical enrichment analysis. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations.
引用
收藏
页码:3523 / 3532
页数:10
相关论文
共 28 条
  • [21] Gene set enrichment analysis using linear models and diagnostics
    Oron, Assaf P.
    Jiang, Zhen
    Gentleman, Robert
    [J]. BIOINFORMATICS, 2008, 24 (22) : 2586 - 2591
  • [22] Use and misuse of the gene ontology annotations
    Rhee, Seung Yon
    Wood, Valerie
    Dolinski, Kara
    Draghici, Sorin
    [J]. NATURE REVIEWS GENETICS, 2008, 9 (07) : 509 - 515
  • [23] LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data
    Sartor, Maureen A.
    Leikauf, George D.
    Medvedovic, Mario
    [J]. BIOINFORMATICS, 2009, 25 (02) : 211 - 217
  • [24] Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements
    Schäffer, AA
    Aravind, L
    Madden, TL
    Shavirin, S
    Spouge, JL
    Wolf, YI
    Koonin, EV
    Altschul, SF
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (14) : 2994 - 3005
  • [25] Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
    Subramanian, A
    Tamayo, P
    Mootha, VK
    Mukherjee, S
    Ebert, BL
    Gillette, MA
    Paulovich, A
    Pomeroy, SL
    Golub, TR
    Lander, ES
    Mesirov, JP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (43) : 15545 - 15550
  • [26] FlyBase: enhancing Drosophila Gene Ontology annotations
    Tweedie, Susan
    Ashburner, Michael
    Falls, Kathleen
    Leyland, Paul
    McQuilton, Peter
    Marygold, Steven
    Millburn, Gillian
    Osumi-Sutherland, David
    Schroeder, Andrew
    Seal, Ruth
    Zhang, Haiyan
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D555 - D559
  • [27] BayGO:: Bayesian analysis of ontology term enrichment in microarray data
    Vêncio, RZN
    Koide, T
    Gomes, SL
    Pereira, CAD
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [28] Bidirectional promoters generate pervasive transcription in yeast
    Xu, Zhenyu
    Wei, Wu
    Gagneur, Julien
    Perocchi, Fabiana
    Clauder-Muenster, Sandra
    Camblong, Jurgi
    Guffanti, Elisa
    Stutz, Francoise
    Huber, Wolfgang
    Steinmetz, Lars M.
    [J]. NATURE, 2009, 457 (7232) : 1033 - U7