The Next Generation of Transcription Factor Binding Site Prediction

被引:118
作者
Mathelier, Anthony [1 ]
Wasserman, Wyeth W. [1 ]
机构
[1] Univ British Columbia, Dept Med Genet, Ctr Mol Med & Therapeut, Child & Family Res Inst, Vancouver, BC, Canada
基金
加拿大创新基金会; 加拿大自然科学与工程研究理事会; 加拿大健康研究院;
关键词
PROTEIN-DNA INTERACTIONS; SEQUENCE; IDENTIFICATION; DISCOVERY; DATABASE; DETERMINANTS; RECOGNITION; AFFINITIES; ALGORITHM; ELEMENTS;
D O I
10.1371/journal.pcbi.1003214
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Transcription Factor Binding and Nucleosome Positioning Are Alternative Pathways for Transcription Start Site Selection in Eukaryotic Promoters
    Dreos, Rene
    Ambrosini, Giovanna
    Bucher, Philipp
    PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 695 - 706
  • [32] Computational prediction of transcription factor binding sites based on an integrative approach incorporating genomic and epigenomic features
    Seok, Ho-Sik
    Kim, Jaebum
    GENES & GENOMICS, 2014, 36 (01) : 25 - 30
  • [33] Probabilistic Inference on Multiple Normalized Signal Profiles from Next Generation Sequencing: Transcription Factor Binding Sites
    Wong, Ka-Chun
    Peng, Chengbin
    Li, Yue
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (06) : 1416 - 1428
  • [34] Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
    Omidi, Saeed
    Zavolan, Mihaela
    Pachkov, Mikhail
    Breda, Jeremie
    Berger, Severin
    van Nimwegen, Erik
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (07)
  • [35] Anchor: trans-cell type prediction of transcription factor binding sites
    Li, Hongyang
    Quang, Daniel
    Guan, Yuanfang
    GENOME RESEARCH, 2019, 29 (02) : 281 - 292
  • [36] RTFBSDB: an integrated framework for transcription factor binding site analysis
    Wang, Zhong
    Martins, Andre L.
    Danko, Charles G.
    BIOINFORMATICS, 2016, 32 (19) : 3024 - 3026
  • [37] PromoterSweep: a tool for identification of transcription factor binding sites
    del Val, Coral
    Pelz, Oliver
    Glatting, Karl-Heinz
    Barta, Endre
    Hotz-Wagenblatt, Agnes
    THEORETICAL CHEMISTRY ACCOUNTS, 2010, 125 (3-6) : 583 - 591
  • [38] Varying levels of complexity in transcription factor binding motifs
    Keilwagen, Jens
    Grau, Jan
    NUCLEIC ACIDS RESEARCH, 2015, 43 (18)
  • [39] Erroneous attribution of relevant transcription factor binding sites despite successful prediction of cis-regulatory modules
    Halfon, Marc S.
    Zhu, Qianqian
    Brennan, Elizabeth R.
    Zhou, Yiyun
    BMC GENOMICS, 2011, 12
  • [40] Evolution of transcription factor binding through sequence variations and turnover of binding sites
    Krieger, Gat
    Lupo, Offir
    Wittkopp, Patricia
    Barkai, Naama
    GENOME RESEARCH, 2022, 32 (06) : 1099 - 1111