Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data

被引:6
|
作者
Yu, Chun-Ping [1 ]
Kuo, Chen-Hao [1 ]
Nelson, Chase W. [1 ,2 ]
Chen, Chi-An [1 ]
Soh, Zhi Thong [1 ]
Lin, Jinn-Jy [1 ]
Hsiao, Ru-Xiu [1 ]
Chang, Chih-Yao [1 ]
Li, Wen-Hsiung [1 ,3 ]
机构
[1] Acad Sinica, Biodivers Res Ctr, Taipei 115, Taiwan
[2] Amer Museum Nat Hist, Inst Comparat Genom, New York, NY 10024 USA
[3] Univ Chicago, Dept Ecol & Evolut, 940 E 57th St, Chicago, IL 60637 USA
关键词
ChIP-seq; transcription factor; binding site; promoter; position weight matrix; CHROMATIN; ENCODE; ALIGNMENT; PROTEINS; FEATURES;
D O I
10.1073/pnas.2026754118
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density +/- 2 kb around transcription start sites (TSSs) with a peak at -50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (-1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Pinpointing transcription factor binding sites from ChIP-seq data with SeqSite
    Wang, Xi
    Zhang, Xuegong
    BMC SYSTEMS BIOLOGY, 2011, 5
  • [2] FROM BINDING MOTIFS IN CHIP-SEQ DATA TO IMPROVED MODELS OF TRANSCRIPTION FACTOR BINDING SITES
    Kulakovskiy, Ivan
    Levitsky, Victor
    Oshchepkov, Dmitry
    Bryzgalov, Leonid
    Vorontsov, Ilya
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2013, 11 (01)
  • [3] Identification of transcription factor binding sites from ChIP-seq data at high resolution
    Bardet, Anais F.
    Steinmann, Jonas
    Bafna, Sangeeta
    Knoblich, Juergen A.
    Zeitlinger, Julia
    Stark, Alexander
    BIOINFORMATICS, 2013, 29 (21) : 2705 - 2713
  • [4] On the detection and refinement of transcription factor binding sites using ChIP-Seq data
    Hu, Ming
    Yu, Jindan
    Taylor, Jeremy M. G.
    Chinnaiyan, Arul M.
    Qin, Zhaohui S.
    NUCLEIC ACIDS RESEARCH, 2010, 38 (07) : 2154 - 2167
  • [5] Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
    Chung, Dongjun
    Kuan, Pei Fen
    Li, Bo
    Sanalkumar, Rajendran
    Liang, Kun
    Bresnick, Emery H.
    Dewey, Colin
    Keles, Suenduez
    PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (07)
  • [6] Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data
    Valouev, Anton
    Johnson, David S.
    Sundquist, Andreas
    Medina, Catherine
    Anton, Elizabeth
    Batzoglou, Serafim
    Myers, Richard M.
    Sidow, Arend
    NATURE METHODS, 2008, 5 (09) : 829 - 834
  • [7] Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data
    Valouev A.
    Johnson D.S.
    Sundquist A.
    Medina C.
    Anton E.
    Batzoglou S.
    Myers R.M.
    Sidow A.
    Nature Methods, 2008, 5 (9) : 829 - 834
  • [8] dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data
    Chung, Dongjun
    Park, Dan
    Myers, Kevin
    Grass, Jeffrey
    Kiley, Patricia
    Landick, Robert
    Keles, Suenduez
    PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (10)
  • [9] GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments
    Yevshin, Ivan
    Sharipov, Ruslan
    Valeev, Tagir
    Kel, Alexander
    Kolpakov, Fedor
    NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D61 - D67
  • [10] Optimized detection of transcription factor-binding sites in ChIP-seq experiments
    Elo, Laura L.
    Kallio, Aleksi
    Laajala, Teemu D.
    Hawkins, R. David
    Korpelainen, Eija
    Aittokallio, Tero
    NUCLEIC ACIDS RESEARCH, 2012, 40 (01)