A Bayesian Extension of the Hypergeometric Test for Functional Enrichment Analysis

被引:54
作者
Cao, Jing [1 ]
Zhang, Song [2 ]
机构
[1] So Methodist Univ, Dept Stat Sci, Dallas, TX 75275 USA
[2] Univ Texas SW Med Ctr Dallas, Dept Clin Sci, Dallas, TX 75390 USA
基金
美国国家科学基金会;
关键词
Functional enrichment analysis; Gene ontology; Hypergeometric P-value; Modular enrichment analysis; Non-central hypergeometric distribution; GENE-EXPRESSION DATA; B-CELL; SET ENRICHMENT; LISTS; SELECTION; MODEL; OVERREPRESENTATION; ACTIVATION; NETWORKS; TOOLS;
D O I
10.1111/biom.12122
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Functional enrichment analysis is conducted on high-throughput data to provide functional interpretation for a list of genes or proteins that share a common property, such as being differentially expressed (DE). The hypergeometric P-value has been widely used to investigate whether genes from pre-defined functional terms, for example, Gene Ontology (GO), are enriched in the DE genes. The hypergeometric P-value has three limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms; (3) repeated use of information due to overlapping annotations by the true-path rule. We propose a Bayesian approach based on the non-central hypergeometric model. The GO dependence structure is incorporated through a prior on non-centrality parameters. The likelihood function does not include overlapping information. The inference about enrichment is based on posterior probabilities that do not have a size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. We also describe the basic ideas of assumption and implementation of different methods to provide some theoretical insights, which are demonstrated via a simulation study. A real application is presented.
引用
收藏
页码:84 / 94
页数:11
相关论文
共 43 条
[1]   Improved scoring of functional groups from gene expression data by decorrelating GO graph structure [J].
Alexa, Adrian ;
Rahnenfuehrer, Joerg ;
Lengauer, Thomas .
BIOINFORMATICS, 2006, 22 (13) :1600-1607
[2]  
[Anonymous], 2003, Bayesian Data Analysis
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Reverse engineering of regulatory networks in human B cells [J].
Basso, K ;
Margolin, AA ;
Stolovitzky, G ;
Klein, U ;
Dalla-Favera, R ;
Califano, A .
NATURE GENETICS, 2005, 37 (04) :382-390
[5]   GOing Bayesian: model-based gene set analysis of genome-scale data [J].
Bauer, Sebastian ;
Gagneur, Julien ;
Robinson, Peter N. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (11) :3523-3532
[6]   Integrated analysis of gene expression by association rules discovery [J].
Carmona-Saez, P ;
Chagoyen, M ;
Rodriguez, A ;
Trelles, O ;
Carazo, JM ;
Pascual-Montano, A .
BMC BIOINFORMATICS, 2006, 7 (1)
[7]   A Bayesian mixture model for differential gene expression [J].
Do, KA ;
Müller, P ;
Tang, F .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2005, 54 :627-644
[8]   Fraternal twins: Swiprosin-1/EFhd2 and Swiprosin-2/EFhd1, two homologous EF-hand containing calcium binding adaptor proteins with distinct functions [J].
Duetting, Sebastian ;
Brachs, Sebastian ;
Mielenz, Dirk .
CELL COMMUNICATION AND SIGNALING, 2011, 9
[9]   ON TESTING THE SIGNIFICANCE OF SETS OF GENES [J].
Efron, Bradley ;
Tibshirani, Robert .
ANNALS OF APPLIED STATISTICS, 2007, 1 (01) :107-129
[10]   Using GOstats to test gene lists for GO term association [J].
Falcon, S. ;
Gentleman, R. .
BIOINFORMATICS, 2007, 23 (02) :257-258