An empirical Bayes approach to inferring large-scale gene association networks

被引:515
作者
Schäfer, J [1 ]
Strimmer, K [1 ]
机构
[1] Univ Munich, Dept Stat, D-80539 Munich, Germany
关键词
D O I
10.1093/bioinformatics/bti062
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. Methods: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. Results: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.
引用
收藏
页码:754 / 764
页数:11
相关论文
共 56 条
[1]  
[Anonymous], 2003, P 3 INT C DATA MININ
[2]   MOLECULAR-CLONING OF 2 CD7 (T-CELL LEUKEMIA ANTIGEN) CDNAS BY A COS CELL EXPRESSION SYSTEM [J].
ARUFFO, A ;
SEED, B .
EMBO JOURNAL, 1987, 6 (11) :3313-3316
[3]   Network biology:: Understanding the cell's functional organization [J].
Barabási, AL ;
Oltvai, ZN .
NATURE REVIEWS GENETICS, 2004, 5 (02) :101-U15
[4]   Revising regulatory networks: from expression data to linear causal models [J].
Bay, SD ;
Shrager, J ;
Pohorille, A ;
Langley, P .
JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (5-6) :289-297
[5]   On the adaptive control of the false discovery fate in multiple testing with independent statistics [J].
Benjamini, Y ;
Hochberg, Y .
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2000, 25 (01) :60-83
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[8]   Inhibition of skin tumor growth and angiogenesis in vivo by activation of cannabinoid receptors [J].
Casanova, ML ;
Blázquez, C ;
Martínez-Palacio, J ;
Villanueva, C ;
Fernández-Aceñero, MJ ;
Huffman, JW ;
Jorcano, JL ;
Guzmán, M .
JOURNAL OF CLINICAL INVESTIGATION, 2003, 111 (01) :43-50
[9]   TESTS OF LINEARITY, MULTIVARIATE NORMALITY AND THE ADEQUACY OF LINEAR SCORES [J].
COX, DR ;
WERMUTH, N .
APPLIED STATISTICS-JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C, 1994, 43 (02) :347-355
[10]   Genetic network inference: from co-expression clustering to reverse engineering [J].
D'haeseleer, P ;
Liang, SD ;
Somogyi, R .
BIOINFORMATICS, 2000, 16 (08) :707-726