Statistics of protein-DNA binding and the total number of binding sites for a transcription factor in the mammalian genome

被引:11
作者
Kuznetsov, Vladimir A. [1 ]
Singh, Onkar [2 ]
Jenjaroenpun, Piroon [1 ]
机构
[1] Bioinformat Inst, Dept Genome & Gene Express Data Anal, Singapore 138671, Singapore
[2] Natl Canc Ctr, Lab Clin Pharmacol, Div Med Sci, Singapore 169610, Singapore
来源
BMC GENOMICS | 2010年 / 11卷
关键词
CHIP-SEQ; GENE-EXPRESSION; WIDE ANALYSIS; SEQUENCE; MYC; SENSITIVITY; AVIDITY; NETWORK; STAT1;
D O I
10.1186/1471-2164-11-S1-S12
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Transcription factor (TF)-DNA binding loci are explored by analyzing massive datasets generated with application of Chromatin Immuno-Precipitation (ChIP)-based high-throughput sequencing technologies. These datasets suffer from a bias in the information about binding loci availability, sample incompleteness and diverse sources of technical and biological noises. Therefore adequate mathematical models of ChIP-based high-throughput assay(s) and statistical tools are required for a robust identification of specific and reliable TF binding sites (TFBS), a precise characterization of TFBS avidity distribution and a plausible estimation the total number of specific TFBS for a given TF in the genome for a given cell type. Results: We developed an exploratory mixture probabilistic model for a specific and non-specific transcription factor-DNA (TF-DNA) binding. Within ChiP-seq data sets, the statistics of specific and non-specific DNA-protein binding is defined by a mixture of sample size-dependent skewed functions described by Kolmogorov-Waring (K-W) function (Kuznetsov, 2003) and exponential function, respectively. Using available Chip-seq data for eleven TFs, essential for self-maintenance and differentiation of mouse embryonic stem cells (SC) (Nanog, Oct4, sox2, KLf4, STAT3, E2F1, Tcfcp211, ZFX, n-Myc, c-Myc and Essrb) reported in Chen et al (2008), we estimated (i) the specificity and the sensitivity of the ChiP-seq binding assays and (ii) the number of specific but not identified in the current experiments binding sites (BSs) in the genome of mouse embryonic stem cells. Motif finding analysis applied to the identified c-Myc TFBSs supports our results and allowed us to predict many novel c-Myc target genes. Conclusion: We provide a novel methodology of estimating the specificity and the sensitivity of TF-DNA binding in massively paralleled ChIP sequencing (ChIP-seq) binding assay. Goodness-of fit analysis of K-W functions suggests that a large fraction of low-and moderate-avidity TFBSs cannot be identified by the ChIP-based methods. Thus the task to identify the binding sensitivity of a TF cannot be technically resolved yet by current ChIP-seq, compared to former experimental techniques. Considering our improvement in measuring the sensitivity and the specificity of the TFs obtained from the ChIP-seq data, the models of transcriptional regulatory networks in embryonic cells and other cell types derived from the given ChIp-seq data should be carefully revised.
引用
收藏
页数:27
相关论文
共 39 条
  • [1] [Anonymous], 1997, Discrete Multivariate Distributions
  • [2] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [3] High-resolution profiling of histone methylations in the human genome
    Barski, Artern
    Cuddapah, Suresh
    Cui, Kairong
    Roh, Tae-Young
    Schones, Dustin E.
    Wang, Zhibin
    Wei, Gang
    Chepelev, Iouri
    Zhao, Keji
    [J]. CELL, 2007, 129 (04) : 823 - 837
  • [4] Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE)
    Bhinge, Akshay A.
    Kim, Jonghwan
    Euskirchen, Ghia M.
    Snyder, Michael
    Iyer, Vishwanath R.
    [J]. GENOME RESEARCH, 2007, 17 (06) : 910 - 916
  • [5] BINDING OF MYC PROTEINS TO CANONICAL AND NONCANONICAL DNA-SEQUENCES
    BLACKWELL, TK
    HUANG, J
    MA, A
    KRETZNER, L
    ALT, FW
    EISENMAN, RN
    WEINTRAUB, H
    [J]. MOLECULAR AND CELLULAR BIOLOGY, 1993, 13 (09) : 5216 - 5224
  • [6] ESTIMATING THE NUMBER OF SPECIES - A REVIEW
    BUNGE, J
    FITZPATRICK, M
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) : 364 - 373
  • [7] Integration of external signaling pathways with the core transcriptional network in embryonic stem cells
    Chen, Xi
    Xu, Han
    Yuan, Ping
    Fang, Fang
    Huss, Mikael
    Vega, Vinsensius B.
    Wong, Eleanor
    Orlov, Yuriy L.
    Zhang, Weiwei
    Jiang, Jianming
    Loh, Yuin-Han
    Yeo, Hock Chuan
    Yeo, Zhen Xuan
    Narang, Vipin
    Govindarajan, Kunde Ramamoorthy
    Leong, Bernard
    Shahab, Atif
    Ruan, Yijun
    Bourque, Guillaume
    Sung, Wing-Kin
    Clarke, Neil D.
    Wei, Chia-Lin
    Ng, Huck-Hui
    [J]. CELL, 2008, 133 (06) : 1106 - 1117
  • [8] NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence
    Down, TA
    Hubbard, TJP
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (05) : 1445 - 1453
  • [9] Stochastic models for aggregation processes
    Duerr, HP
    Dietz, K
    [J]. MATHEMATICAL BIOSCIENCES, 2000, 165 (02) : 135 - 145
  • [10] The Oncogenic EWS-FLI1 Protein Binds In Vivo GGAA Microsatellite Sequences with Potential Transcriptional Activation Function (Publication with Expression of Concern. See vol. 17, 2022)
    Guillon, Noelle
    Tirode, Franck
    Boeva, Valentina
    Zynovyev, Andrei
    Barillot, Emmanuel
    Delattre, Olivier
    [J]. PLOS ONE, 2009, 4 (03):