A fully Bayesian hidden Ising model for ChIP-seq data analysis

被引:12
作者
Mo, Qianxing [1 ]
机构
[1] Mem Sloan Kettering Canc Ctr, Dept Epidemiol & Biostat, New York, NY 10065 USA
关键词
ChIP-seq; Ising model; Markov random fields; Massively parallel sequencing; Next generation sequencing; BINDING-SITES; DNA; ALGORITHM;
D O I
10.1093/biostatistics/kxr029
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a powerful technique that is being used in a wide range of biological studies including genome-wide measurements of protein-DNA interactions, DNA methylation, and histone modifications. The vast amount of data and biases introduced by sequencing and/or genome mapping pose new challenges and call for effective methods and fast computer programs for statistical analysis. To systematically model ChIP-seq data, we build a dynamic signal profile for each chromosome and then model the profile using a fully Bayesian hidden Ising model. The proposed model naturally takes into account spatial dependency and global and local distributions of sequence tags. It can be used for one-sample and two-sample analyses. Through model diagnosis, the proposed method can detect falsely enriched regions caused by sequencing and/or mapping errors, which is usually not offered by the existing hypothesis-testing-based methods. The proposed method is illustrated using 3 transcription factor (TF) ChIP-seq data sets and 2 mixed ChIP-seq data sets and compared with 4 popular and/or well-documented methods: MACS, CisGenome, BayesPeak, and SISSRs. The results indicate that the proposed method achieves equivalent or higher sensitivity and spatial resolution in detecting TF binding sites with false discovery rate at a much lower level.
引用
收藏
页码:113 / 128
页数:16
相关论文
共 28 条
  • [1] [Anonymous], 1980, CONT MATH
  • [2] MEME: discovering and analyzing DNA and protein sequence motifs
    Bailey, Timothy L.
    Williams, Nadya
    Misleh, Chris
    Li, Wilfred W.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W369 - W373
  • [3] Combining evidence using p-values: application to sequence homology searches
    Bailey, TL
    Gribskov, M
    [J]. BIOINFORMATICS, 1998, 14 (01) : 48 - 54
  • [4] High-resolution profiling of histone methylations in the human genome
    Barski, Artern
    Cuddapah, Suresh
    Cui, Kairong
    Roh, Tae-Young
    Schones, Dustin E.
    Wang, Zhibin
    Wei, Gang
    Chepelev, Iouri
    Zhao, Keji
    [J]. CELL, 2007, 129 (04) : 823 - 837
  • [5] Baxter R., 2007, Exactly Solved Models in Statistical Mechanics
  • [6] F-Seq: a feature density estimator for high-throughput sequence tags
    Boyle, Alan P.
    Guinney, Justin
    Crawford, Gregory E.
    Furey, Terrence S.
    [J]. BIOINFORMATICS, 2008, 24 (21) : 2537 - 2538
  • [7] FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology
    Fejes, Anthony P.
    Robertson, Gordon
    Bilenky, Mikhail
    Varhol, Richard
    Bainbridge, Matthew
    Jones, Steven J. M.
    [J]. BIOINFORMATICS, 2008, 24 (15) : 1729 - 1730
  • [8] A flexible and powerful bayesian hierarchical model for ChIP-chip experiments
    Gottardo, Raphael
    Li, Wei
    Johnson, W. Evan
    Liu, X. Shirley
    [J]. BIOMETRICS, 2008, 64 (02) : 468 - 478
  • [9] SIMULATION RUN LENGTH CONTROL IN THE PRESENCE OF AN INITIAL TRANSIENT
    HEIDELBERGER, P
    WELCH, PD
    [J]. OPERATIONS RESEARCH, 1983, 31 (06) : 1109 - 1144
  • [10] TileMap: create chromosomal map of tiling array hybridizations
    Ji, HK
    Wong, WH
    [J]. BIOINFORMATICS, 2005, 21 (18) : 3629 - 3636