Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

被引:70
作者
Schmidt, Florian [1 ,2 ]
Gasparoni, Nina [3 ]
Gasparoni, Gilles [3 ]
Gianmoena, Kathrin [4 ]
Cadenas, Cristina [4 ]
Polansky, Julia K. [5 ]
Ebert, Peter [2 ,6 ]
Nordstroem, Karl [3 ]
Barann, Matthias [7 ]
Sinha, Anupam [7 ]
Froehler, Sebastian [8 ]
Xiong, Jieyi [8 ]
Amirabad, Azim Dehghani [1 ,2 ,6 ]
Ardakani, Fatemeh Behjati [1 ,2 ]
Hutter, Barbara [9 ]
Zipprich, Gideon
Felder, Baerbel [10 ]
Eils, Juergen [10 ]
Brors, Benedikt [9 ]
Chen, Wei [8 ]
Hengstler, Jan G. [4 ]
Hamann, Alf [6 ]
Lengauer, Thomas [2 ]
Rosenstiel, Philip [7 ]
Walter, Joern [3 ]
Schulz, Marcel H. [1 ,2 ]
机构
[1] Cluster Excellence Multimodal Comp & Interact, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[2] Max Planck Inst Informat, Computat Biol & Appl Algorithm, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[3] Univ Saarland, Dept Genet, D-66123 Saarbrucken, Germany
[4] Leibniz Res Ctr Working Environm & Human Factors, D-44139 Dortmund, Germany
[5] German Rheumatism Res Ctr, Expt Rheumatol, D-10117 Berlin, Germany
[6] Int Max Planck Res Sch Comp Sci, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[7] Univ Kiel, Inst Clin Mol Biol, D-24105 Kiel, Germany
[8] Max Delbruck Ctr Mol Med, Berlin Inst Med Syst Biol, D-13092 Berlin, Germany
[9] Deutsch Krebsforschungszentrum, Appl Bioinformat, D-69120 Heidelberg, Germany
[10] Deutsch Krebsforschungszentrum, Data Management & Genom IT, D-69120 Heidelberg, Germany
关键词
CHIP-SEQ DATA; COEXPRESSION NETWORK ANALYSIS; DNA; SITES; GENOME; INTEGRATION; HYPERSENSITIVITY; REGULARIZATION; FOOTPRINTS; EXPANSION;
D O I
10.1093/nar/gkw1061
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively.
引用
收藏
页码:54 / 66
页数:13
相关论文
共 71 条
  • [1] BLUEPRINT to decode the epigenetic signature written in blood
    Adams, David
    Altucci, Lucia
    Antonarakis, Stylianos E.
    Ballesteros, Juan
    Beck, Stephan
    Bird, Adrian
    Bock, Christoph
    Boehm, Bernhard
    Campo, Elias
    Caricasole, Andrea
    Dahl, Fredrik
    Dermitzakis, Emmanouil T.
    Enver, Tariq
    Esteller, Manel
    Estivill, Xavier
    Ferguson-Smith, Anne
    Fitzgibbon, Jude
    Flicek, Paul
    Giehl, Claudia
    Graf, Thomas
    Grosveld, Frank
    Guigo, Roderic
    Gut, Ivo
    Helin, Kristian
    Jarvius, Jonas
    Kueppers, Ralf
    Lehrach, Hans
    Lengauer, Thomas
    Lernmark, Ake
    Leslie, David
    Loeffler, Markus
    Macintyre, Elizabeth
    Mai, Antonello
    Martens, Joost H. A.
    Minucci, Saverio
    Ouwehand, Willem H.
    Pelicci, Pier Giuseppe
    Pendeville, Helene
    Porse, Bo
    Rakyan, Vardhman
    Reik, Wolf
    Schrappe, Martin
    Schuebeler, Dirk
    Seifert, Martin
    Siebert, Reiner
    Simmons, David
    Soranzo, Nicole
    Spicuglia, Salvatore
    Stratton, Michael
    Stunnenberg, Hendrik G.
    [J]. NATURE BIOTECHNOLOGY, 2012, 30 (03) : 224 - 226
  • [2] Unique functions of Gata4 in mouse liver induction and heart development
    Borok, Matthew J.
    Papaioannou, Virginia E.
    Sussel, Lori
    [J]. DEVELOPMENTAL BIOLOGY, 2016, 410 (02) : 213 - 222
  • [3] High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells
    Boyle, Alan P.
    Song, Lingyun
    Lee, Bum-Kyu
    London, Darin
    Keefe, Damian
    Birney, Ewan
    Iyer, Vishwanath R.
    Crawford, Gregory E.
    Furey, Terrence S.
    [J]. GENOME RESEARCH, 2011, 21 (03) : 456 - 464
  • [4] Predictive modelling of gene expression from transcriptional regulatory elements
    Budden, David M.
    Hurley, Daniel G.
    Crampin, Edmund J.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2015, 16 (04) : 616 - 628
  • [5] Predicting expression: the complementary power of histone modification and transcription factor binding data
    Budden, David M.
    Hurley, Daniel G.
    Cursons, Joseph
    Markham, John F.
    Davis, Melissa J.
    Crampin, Edmund J.
    [J]. EPIGENETICS & CHROMATIN, 2014, 7
  • [6] Buenrostro Jason D, 2015, Curr Protoc Mol Biol, V109, DOI 10.1002/0471142727.mb2129s109
  • [7] Integration of external signaling pathways with the core transcriptional network in embryonic stem cells
    Chen, Xi
    Xu, Han
    Yuan, Ping
    Fang, Fang
    Huss, Mikael
    Vega, Vinsensius B.
    Wong, Eleanor
    Orlov, Yuriy L.
    Zhang, Weiwei
    Jiang, Jianming
    Loh, Yuin-Han
    Yeo, Hock Chuan
    Yeo, Zhen Xuan
    Narang, Vipin
    Govindarajan, Kunde Ramamoorthy
    Leong, Bernard
    Shahab, Atif
    Ruan, Yijun
    Bourque, Guillaume
    Sung, Wing-Kin
    Clarke, Neil D.
    Wei, Chia-Lin
    Ng, Huck-Hui
    [J]. CELL, 2008, 133 (06) : 1106 - 1117
  • [8] A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data
    Chen, Xiaoyu
    Hoffman, Michael M.
    Bilmes, Jeff A.
    Hesselberth, Jay R.
    Noble, William S.
    [J]. BIOINFORMATICS, 2010, 26 (12) : i334 - i342
  • [9] Predicting gene expression in T cell differentiation from histone modifications and transcription factor binding affinities by linear mixture models
    Costa, Ivan G.
    Roider, Helge G.
    do Rego, Thais G.
    de Carvalho, Francisco de A. T.
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [10] Transcription factors in liver development, differentiation, and regeneration
    Costa, RH
    Kalinichenko, VV
    Holterman, AXL
    Wang, XH
    [J]. HEPATOLOGY, 2003, 38 (06) : 1331 - 1347