Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources

被引:38
|
作者
Lahdesmaki, Harri [1 ]
Rust, Alistair G. [1 ]
Shmulevich, Ilya [1 ]
机构
[1] Inst Syst Biol, Seattle, WA USA
来源
PLOS ONE | 2008年 / 3卷 / 03期
关键词
D O I
10.1371/journal.pone.0001820
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Probabilistic Inference on Multiple Normalized Signal Profiles from Next Generation Sequencing: Transcription Factor Binding Sites
    Wong, Ka-Chun
    Peng, Chengbin
    Li, Yue
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (06) : 1416 - 1428
  • [2] Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data
    Pique-Regi, Roger
    Degner, Jacob F.
    Pai, Athma A.
    Gaffney, Daniel J.
    Gilad, Yoav
    Pritchard, Jonathan K.
    GENOME RESEARCH, 2011, 21 (03) : 447 - 455
  • [3] Probabilistic inference of molecular networks from noisy data sources
    Iossifov, I
    Krauthammer, M
    Friedman, C
    Hatzivassiloglou, V
    Bader, JS
    White, KP
    Rzhetsky, A
    BIOINFORMATICS, 2004, 20 (08) : 1205 - 1213
  • [4] Semiparametric inference for merged data from multiple data sources
    Saegusa, Takumi
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2022, 216 : 1 - 14
  • [5] TFInfer: a tool for probabilistic inference of transcription factor activities
    Asif, H. M. Shahzad
    Rolfe, Matthew D.
    Green, Jeff
    Lawrence, Neil D.
    Rattray, Magnus
    Sanguinetti, Guido
    BIOINFORMATICS, 2010, 26 (20) : 2635 - 2636
  • [6] Probabilistic framework for transcription factor binding prediction
    Laehdesmaeki, Harri
    Shmulevich, Ilya
    2007 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2007, : 95 - 98
  • [7] Fusion and inference from multiple data sources in a commensurate space
    Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, United States
    不详
    Stat. Anal. Data Min., 2012, 3 (187-193):
  • [8] Integrating multiple evidence sources to predict transcription factor binding in the human genome
    Ernst, Jason
    Plasterer, Heather L.
    Simon, Itamar
    Bar-Joseph, Ziv
    GENOME RESEARCH, 2010, 20 (04) : 526 - 536
  • [9] Reducing redundancy of input data sets to improve inference of transcription factor binding sites
    Vychyk, Pavel
    Nikolaichik, Yevgeny
    BMC BIOINFORMATICS, 2020, 21 (SUPPL 20):
  • [10] Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources
    Praveen, Paurush
    Froehlich, Holger
    PLOS ONE, 2013, 8 (06):