Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data

被引:29
作者
Levitsky, Victor G. [1 ,2 ]
Kulakovskiy, Ivan V. [3 ,4 ]
Ershov, Nikita I. [1 ]
Oshchepkov, Dmitry Yu [1 ]
Makeev, Vsevolod J. [3 ,4 ]
Hodgman, T. C. [5 ]
Merkulova, Tatyana I. [1 ,2 ]
机构
[1] Russian Acad Sci, Inst Cytol & Genet, Siberian Div, Lavrentieva Prospect 10, Novosibirsk 630090, Russia
[2] Novosibirsk State Univ, Novosibirsk 630090, Russia
[3] Russian Acad Sci, Engelhardt Inst Mol Biol, Moscow 119991, Russia
[4] Russian Acad Sci, Vavilov Inst Gen Genet, Dept Computat Syst Biol, Moscow 119991, Russia
[5] Univ Nottingham, Sch Biosci, Multidisciplinary Ctr Integrat Biol, Sutton LE12 5RD, Surrey, England
来源
BMC GENOMICS | 2014年 / 15卷
基金
俄罗斯基础研究基金会;
关键词
ChIP-Seq; EMSA; Transcription factor binding sites; FoxA; SiteGA; PWM; Transcription factor binding model; Dinucleotide frequencies; GLUCOCORTICOID-RECEPTOR; MOTIF DISCOVERY; PROTEIN; GENE; ELEMENTS; IDENTIFICATION; FAMILY; C/EBP; HNF3; TRRD;
D O I
10.1186/1471-2164-15-80
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. Results: Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. Conclusions: The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs.
引用
收藏
页数:12
相关论文
共 54 条
  • [41] Rougemont J, 2012, METHODS MOL BIOL, V786, P263, DOI 10.1007/978-1-61779-292-2_16
  • [42] HEPATOCYTE NUCLEAR FACTOR-3 DETERMINES THE AMPLITUDE OF THE GLUCOCORTICOID RESPONSE OF THE RAT TYROSINE AMINOTRANSFERASE GENE
    ROUX, J
    PICTET, R
    GRANGE, T
    [J]. DNA AND CELL BIOLOGY, 1995, 14 (05) : 385 - 396
  • [43] Mechanisms of glucocorticoid signalling
    Schoneveld, OJLM
    Gaemers, IC
    Lamers, WH
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA-GENE STRUCTURE AND EXPRESSION, 2004, 1680 (02): : 114 - 128
  • [44] DNA binding sites: representation and discovery
    Stormo, GD
    [J]. BIOINFORMATICS, 2000, 16 (01) : 16 - 23
  • [45] Identification and characterization of glucocorticoid receptor-binding sites in the human genome
    Taniguchi-Yanai, Keiko
    Koike, Yoshiko
    Hasegawa, Takashi
    Furuta, Yuichi
    Serizawa, Masakuni
    Ohshima, Noriko
    Kato, Norihiro
    Yanai, Kazuyuki
    [J]. JOURNAL OF RECEPTORS AND SIGNAL TRANSDUCTION, 2010, 30 (02) : 88 - 105
  • [46] RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets
    Thomas-Chollier, Morgane
    Herrmann, Carl
    Defrance, Matthieu
    Sand, Olivier
    Thieffry, Denis
    van Helden, Jacques
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (04) : e31
  • [47] Position dependencies in transcription factor binding sites
    Tomovic, Andrija
    Oakeley, Edward J.
    [J]. BIOINFORMATICS, 2007, 23 (08) : 933 - 941
  • [48] TRANSFAC_ Team, 2002, TRANSFAC REP, V3, P0001
  • [49] Extracting transcription factor targets from ChIP-Seq data
    Tuteja, Geetu
    White, Peter
    Schug, Jonathan
    Kaestner, Klaus H.
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 (17) : e113 - e113
  • [50] Genome-wide discovery of functional transcription factor binding sites by comparative genomics: The case of Stat3
    Vallania, Francesco
    Schiavone, Davide
    Dewilde, Sarah
    Pupo, Emanuela
    Garbay, Serge
    Calogero, Raffaele
    Pontoglio, Marco
    Provero, Paolo
    Poli, Valeria
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (13) : 5117 - 5122