Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources

被引:38
|
作者
Lahdesmaki, Harri [1 ]
Rust, Alistair G. [1 ]
Shmulevich, Ilya [1 ]
机构
[1] Inst Syst Biol, Seattle, WA USA
来源
PLOS ONE | 2008年 / 3卷 / 03期
关键词
D O I
10.1371/journal.pone.0001820
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Functional inference from non-random distributions of conserved predicted transcription factor binding sites
    Dieterich, Christoph
    Rahmann, Sven
    Vingron, Martin
    BIOINFORMATICS, 2004, 20 : 109 - 115
  • [32] Novel Data Fusion Method and Exploration of Multiple Information Sources for Transcription Factor Target Gene Prediction
    Xiaofeng Dai
    Olli Yli-Harja
    Harri Lähdesmäki
    EURASIP Journal on Advances in Signal Processing, 2010
  • [33] Novel Data Fusion Method and Exploration of Multiple Information Sources for Transcription Factor Target Gene Prediction
    Dai, Xiaofeng
    Yli-Harja, Olli
    Lahdesmaki, Harri
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2010,
  • [34] FROM BINDING MOTIFS IN CHIP-SEQ DATA TO IMPROVED MODELS OF TRANSCRIPTION FACTOR BINDING SITES
    Kulakovskiy, Ivan
    Levitsky, Victor
    Oshchepkov, Dmitry
    Bryzgalov, Leonid
    Vorontsov, Ilya
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2013, 11 (01)
  • [35] METHOD OF PROBABILISTIC INFERENCE FROM LEARNING DATA IN BAYESIAN NETWORKS
    Terent'yev, A. N.
    Biduk, P. I.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2007, 43 (03) : 391 - 396
  • [36] INFERENCE OF STRUCTURES OF MODELS OF PROBABILISTIC DEPENDENCES FROM STATISTICAL DATA
    Balabanov, A. S.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2005, 41 (06) : 808 - 817
  • [37] Using Deep Learning to Predict Transcription Factor Binding Sites Based on Multiple-omics Data
    Xu, Youhong
    Yuan, Changan
    Wu, Hongjie
    Zhao, Xingming
    INTELLIGENT COMPUTING THEORIES AND APPLICATION (ICIC 2022), PT I, 2022, 13393 : 799 - 810
  • [38] Integration of Multiple Data Sources for Gene Network Inference using Genetic Perturbation Data
    Liang, Xiao
    Young, William Chad
    Hung, Ling-Hong
    Raftery, Adrian E.
    Yeung, Ka Yee
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 601 - 602
  • [39] Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data
    Liang, Xiao
    Young, William Chad
    Hung, Ling-Hong
    Raftery, Adrian E.
    Yeung, Ka Yee
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (10) : 1113 - 1129
  • [40] Pinpointing transcription factor binding sites from ChIP-seq data with SeqSite
    Wang, Xi
    Zhang, Xuegong
    BMC SYSTEMS BIOLOGY, 2011, 5