High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints

被引:194
作者
Guo, Yuchun [1 ,2 ]
Mahony, Shaun [2 ]
Gifford, David K. [2 ]
机构
[1] MIT, Computat & Syst Biol Program, Cambridge, MA 02139 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
基金
美国国家卫生研究院;
关键词
CHIP-SEQ DATA; EMBRYONIC STEM-CELLS; TRANSPOSABLE ELEMENTS; DNA; IDENTIFICATION; PROTEIN; SITES; COMPLEXES; ENHANCERS; PROFILES;
D O I
10.1371/journal.pcbi.1002638
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos: c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial control.
引用
收藏
页数:14
相关论文
共 52 条
  • [1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [2] Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
  • [3] BARASH Y, 2001, P WABI BERG NORW, P278
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities
    Berger, Michael F.
    Philippakis, Anthony A.
    Qureshi, Aaron M.
    He, Fangxue S.
    Estep, Preston W., III
    Bulyk, Martha L.
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (11) : 1429 - 1435
  • [6] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [7] MAX - A HELIX-LOOP-HELIX ZIPPER PROTEIN THAT FORMS A SEQUENCE-SPECIFIC DNA-BINDING COMPLEX WITH MYC
    BLACKWOOD, EM
    EISENMAN, RN
    [J]. SCIENCE, 1991, 251 (4998) : 1211 - 1217
  • [8] De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
    Boeva, Valentina
    Surdez, Didier
    Guillon, Noelle
    Tirode, Franck
    Fejes, Anthony P.
    Delattre, Olivier
    Barillot, Emmanuel
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (11) : e126 - e126
  • [9] Evolution of the mammalian transcription factor binding repertoire via transposable elements
    Bourque, Guillaume
    Leong, Bernard
    Vega, Vinsensius B.
    Chen, Xi
    Lee, Yen Ling
    Srinivasan, Kandhadayar G.
    Chew, Joon-Lin
    Ruan, Yijun
    Wei, Chia-Lin
    Ng, Huck Hui
    Liu, Edison T.
    [J]. GENOME RESEARCH, 2008, 18 (11) : 1752 - 1762
  • [10] Integration of external signaling pathways with the core transcriptional network in embryonic stem cells
    Chen, Xi
    Xu, Han
    Yuan, Ping
    Fang, Fang
    Huss, Mikael
    Vega, Vinsensius B.
    Wong, Eleanor
    Orlov, Yuriy L.
    Zhang, Weiwei
    Jiang, Jianming
    Loh, Yuin-Han
    Yeo, Hock Chuan
    Yeo, Zhen Xuan
    Narang, Vipin
    Govindarajan, Kunde Ramamoorthy
    Leong, Bernard
    Shahab, Atif
    Ruan, Yijun
    Bourque, Guillaume
    Sung, Wing-Kin
    Clarke, Neil D.
    Wei, Chia-Lin
    Ng, Huck-Hui
    [J]. CELL, 2008, 133 (06) : 1106 - 1117