The Next Generation of Transcription Factor Binding Site Prediction

被引:118
|
作者
Mathelier, Anthony [1 ]
Wasserman, Wyeth W. [1 ]
机构
[1] Univ British Columbia, Dept Med Genet, Ctr Mol Med & Therapeut, Child & Family Res Inst, Vancouver, BC, Canada
基金
加拿大自然科学与工程研究理事会; 加拿大创新基金会; 加拿大健康研究院;
关键词
PROTEIN-DNA INTERACTIONS; SEQUENCE; IDENTIFICATION; DISCOVERY; DATABASE; DETERMINANTS; RECOGNITION; AFFINITIES; ALGORITHM; ELEMENTS;
D O I
10.1371/journal.pcbi.1003214
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Disentangling transcription factor binding site complexity
    Eggeling, Ralf
    NUCLEIC ACIDS RESEARCH, 2018, 46 (20)
  • [22] Dynamics of Transcription Factor Binding Site Evolution
    Tugrul, Murat
    Paixao, Tiago
    Barton, Nicholas H.
    Tkacik, Gasper
    PLOS GENETICS, 2015, 11 (11):
  • [23] Binding site of MraZ transcription factor in Mollicutes
    Fisunov, G. Y.
    Evsyutina, D. V.
    Semashko, T. A.
    Arzamasov, A. A.
    Manuvera, V. A.
    Letarov, A. V.
    Govorun, V. M.
    BIOCHIMIE, 2016, 125 : 59 - 65
  • [24] Probabilistic framework for transcription factor binding prediction
    Laehdesmaeki, Harri
    Shmulevich, Ilya
    2007 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2007, : 95 - 98
  • [25] Integrated assessment and prediction of transcription factor binding
    Beyer, Andreas
    Workman, Christopher
    Hollunder, Jens
    Radke, Doerte
    Moeller, Ulrich
    Wilhelm, Thomas
    Ideker, Trey
    PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (06) : 615 - 626
  • [26] Definition and prediction of the full range of transcription factor binding sites - the hepatocyte nuclear factor 1 dimeric site
    Locker, J
    Ghosh, D
    Luc, PV
    Zheng, JH
    NUCLEIC ACIDS RESEARCH, 2002, 30 (17) : 3809 - 3817
  • [27] A new generation of JASPAR, the open-access repository for transcription factor binding site profiles
    Vlieghe, Dominique
    Sandelin, Albin
    De Bleser, Pieter J.
    Vleminckx, Kris
    Wasserman, Wyeth W.
    van Roy, Frans
    Lenhard, Boris
    NUCLEIC ACIDS RESEARCH, 2006, 34 : D95 - D97
  • [28] Knowledge-based three-body potential for transcription factor binding site prediction
    Qin, Wenyi
    Zhao, Guijun
    Carson, Matthew
    Jia, Caiyan
    Lu, Hui
    IET SYSTEMS BIOLOGY, 2016, 10 (01) : 23 - 29
  • [29] Motif discovery and transcription factor binding sites before and after the next-generation sequencing era
    Zambelli, Federico
    Pesole, Graziano
    Pavesi, Giulio
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (02) : 225 - 237
  • [30] motifStack for the analysis of transcription factor binding site evolution
    Ou, Jianhong
    Wolfe, Scot A.
    Brodsky, Michael H.
    Zhu, Lihua Julie
    NATURE METHODS, 2018, 15 (01) : 8 - 9