Learning Gene Networks under SNP Perturbations Using eQTL Datasets

被引:32
作者
Zhang, Lingxue [1 ]
Kim, Seyoung [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Lane Ctr Computat Biol, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
MULTISTRESS RESPONSE; SHRINKAGE; SELECTION; COMPLEXITY; EXPRESSION; REGRESSION; GENOMICS; PROTEIN; MODEL;
D O I
10.1371/journal.pcbi.1003420
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response. Author Summary A complete understanding of how gene regulatory networks are wired in a biological system is important in many areas of biology and medicine. The most popular method for investigating a gene network has been based on experimental perturbation studies, where the expression of a gene is experimentally manipulated to observe how this perturbation affects the expressions of other genes. Such experimental methods are costly, laborious, and do not scale to a perturbation of more than two genes at a time. As an alternative, genetical genomics approach uses genetic variants as naturally-occurring perturbations of gene regulatory system and learns gene networks by decoding the perturbation effects by genetic variants, given population gene-expression and genotype data. However, since there exist millions of genetic variants in genomes that simultaneously perturb a gene network, it is not obvious how to decode the effects of such multifactorial perturbations from data. Our statistical approach overcomes this computational challenge and recovers gene networks under SNP perturbations using probabilistic graphical models. As population gene-expression and genotype datasets are routinely collected to study genetic architectures of complex diseases and phenotypes, our approach can directly leverage these existing datasets to provide a more effective way of identifying gene networks.
引用
收藏
页数:20
相关论文
共 47 条
  • [1] Integrating common and rare genetic variation in diverse human populations
    Altshuler, David M.
    Gibbs, Richard A.
    Peltonen, Leena
    Dermitzakis, Emmanouil
    Schaffner, Stephen F.
    Yu, Fuli
    Bonnen, Penelope E.
    de Bakker, Paul I. W.
    Deloukas, Panos
    Gabriel, Stacey B.
    Gwilliam, Rhian
    Hunt, Sarah
    Inouye, Michael
    Jia, Xiaoming
    Palotie, Aarno
    Parkin, Melissa
    Whittaker, Pamela
    Chang, Kyle
    Hawes, Alicia
    Lewis, Lora R.
    Ren, Yanru
    Wheeler, David
    Muzny, Donna Marie
    Barnes, Chris
    Darvishi, Katayoon
    Hurles, Matthew
    Korn, Joshua M.
    Kristiansson, Kati
    Lee, Charles
    McCarroll, Steven A.
    Nemesh, James
    Keinan, Alon
    Montgomery, Stephen B.
    Pollack, Samuela
    Price, Alkes L.
    Soranzo, Nicole
    Gonzaga-Jauregui, Claudia
    Anttila, Verneri
    Brodeur, Wendy
    Daly, Mark J.
    Leslie, Stephen
    McVean, Gil
    Moutsianas, Loukas
    Nguyen, Huy
    Zhang, Qingrun
    Ghori, Mohammed J. R.
    McGinnis, Ralph
    McLaren, William
    Takeuchi, Fumihiko
    Grossman, Sharon R.
    [J]. NATURE, 2010, 467 (7311) : 52 - 58
  • [2] Hsf1p and Msn2/4p cooperate in the expression of Saccharomyces cerevisiae genes HSP26 and HSP104 in a gene- and stress type-dependent manner
    Amorós, M
    Estruch, F
    [J]. MOLECULAR MICROBIOLOGY, 2001, 39 (06) : 1523 - 1532
  • [3] [Anonymous], CORE DISCUSSION PAPE
  • [4] [Anonymous], 1980, Multivariate Analysis
  • [5] Aten J, 2008, BMC SYSTEMS BIOL, V2, P320
  • [6] Banerjee O, 2008, J MACH LEARN RES, V9, P485
  • [7] A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
    Beck, Amir
    Teboulle, Marc
    [J]. SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01): : 183 - 202
  • [8] Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441
  • [9] The landscape of genetic complexity across 5,700 gene expression traits in yeast
    Brem, RB
    Kruglyak, L
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (05) : 1572 - 1577
  • [10] A high-affinity inhibitor of yeast carboxypeptidase Y is encoded by TFSI and shows homology to a family of lipid binding proteins
    Bruun, AW
    Svendsen, I
    Sorensen, SO
    Kielland-Brandt, MC
    Winther, JR
    [J]. BIOCHEMISTRY, 1998, 37 (10) : 3351 - 3357