A systematic, large-scale comparison of transcription factor binding site models

被引:15
作者
Hombach, Daniela [1 ,2 ]
Schwarz, Jana Marie [1 ,2 ]
Robinson, Peter N. [3 ]
Schuelke, Markus [1 ,2 ]
Seelow, Dominik [1 ,2 ,4 ]
机构
[1] Charite, Dept Neuropaediat, D-13353 Berlin, Germany
[2] Charite, NeuroCure Clin Res Ctr, D-13353 Berlin, Germany
[3] Charite, Inst Med Genet & Human Genet, D-13353 Berlin, Germany
[4] Berlin Inst Hlth, Berliner Inst Gesundheitsforsch, Berlin, Germany
来源
BMC GENOMICS | 2016年 / 17卷
关键词
Transcription factor binding sites; TFBS prediction; PSSM; Genetic variation; RAPID EVOLUTION; GENE-REGULATION; DNA; DATABASE; SEQUENCES; IDENTIFICATION; REVEALS;
D O I
10.1186/s12864-016-2729-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. Results: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Conclusions: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo
    Maurano, Matthew T.
    Haugen, Eric
    Sandstrom, Richard
    Vierstra, Jeff
    Shafer, Anthony
    Kaul, Rajinder
    Stamatoyannopoulos, John A.
    NATURE GENETICS, 2015, 47 (12) : 1393 - +
  • [22] Comprehensive Human Transcription Factor Binding Site Map for Combinatory Binding Motifs Discovery
    Mueller-Molina, Arnoldo J.
    Schoeler, Hans R.
    Arauzo-Bravo, Marcos J.
    PLOS ONE, 2012, 7 (11):
  • [23] Large-scale analysis of phosphorylation site occupancy in eukaryotic proteins
    Rao, R. Shyama Prasad
    Moller, Ian Max
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2012, 1824 (03): : 405 - 412
  • [24] LASAGNA-Search: an integrated web tool for transcription factor binding site search and visualization
    Lee, Chih
    Huang, Chun-Hsi
    BIOTECHNIQUES, 2013, 54 (03) : 141 - +
  • [25] Sitecon: A tool for transcription factor binding site recognition
    Oshchepkov, D. Yu
    Grigorovich, D. A.
    Ignatieva, E., V
    Khlebodarova, T. M.
    Proceedings of the Fourth International Conference on Bioinformatics of Genome Regulation and Structure, Vol 1, 2004, : 162 - 165
  • [26] A biocomputational platform for the automated construction of large-scale mathematical models of miRNA-transcription factor networks for studies on gene dosage compensation
    Man-Sai, Acon
    Francisco, Siles-Canales
    Mora-Rodriguez, R. A.
    2016 IEEE 36TH CENTRAL AMERICAN AND PANAMA CONVENTION (CONCAPAN XXXVI), 2016,
  • [27] Transcription factor-DNA binding: beyond binding site motifs
    Inukai, Sachi
    Kock, Kian Hong
    Bulyk, Martha L.
    CURRENT OPINION IN GENETICS & DEVELOPMENT, 2017, 43 : 110 - 119
  • [28] A Multimodal Deep Architecture for Large-Scale Protein Ubiquitylation Site Prediction
    He, Fei
    Bao, Lingling
    Wang, Rui
    Li, Jiagen
    Xu, Dong
    Zhao, Xiaowei
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 108 - 113
  • [29] CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network
    Zhang, Yongqing
    Qiao, Shaojie
    Zeng, Yuanqi
    Gao, Dongrui
    Han, Nan
    Zhou, Jiliu
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [30] HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models
    Kulakovskiy, Ivan V.
    Vorontsov, Ilya E.
    Yevshin, Ivan S.
    Soboleva, Anastasiia V.
    Kasianov, Artem S.
    Ashoor, Haitham
    Ba-alawi, Wail
    Bajic, Vladimir B.
    Medvedeva, Yulia A.
    Kolpakov, Fedor A.
    Makeev, Vsevolod J.
    NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) : D116 - D125