A fully Bayesian hidden Ising model for ChIP-seq data analysis

被引：12

作者：

Mo, Qianxing ^{[1
]}

机构：

[1] Mem Sloan Kettering Canc Ctr, Dept Epidemiol & Biostat, New York, NY 10065 USA

来源：

BIOSTATISTICS | 2012年 / 13卷 / 01期

关键词：

ChIP-seq; Ising model; Markov random fields; Massively parallel sequencing; Next generation sequencing; BINDING-SITES; DNA; ALGORITHM;

D O I：

10.1093/biostatistics/kxr029

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a powerful technique that is being used in a wide range of biological studies including genome-wide measurements of protein-DNA interactions, DNA methylation, and histone modifications. The vast amount of data and biases introduced by sequencing and/or genome mapping pose new challenges and call for effective methods and fast computer programs for statistical analysis. To systematically model ChIP-seq data, we build a dynamic signal profile for each chromosome and then model the profile using a fully Bayesian hidden Ising model. The proposed model naturally takes into account spatial dependency and global and local distributions of sequence tags. It can be used for one-sample and two-sample analyses. Through model diagnosis, the proposed method can detect falsely enriched regions caused by sequencing and/or mapping errors, which is usually not offered by the existing hypothesis-testing-based methods. The proposed method is illustrated using 3 transcription factor (TF) ChIP-seq data sets and 2 mixed ChIP-seq data sets and compared with 4 popular and/or well-documented methods: MACS, CisGenome, BayesPeak, and SISSRs. The results indicate that the proposed method achieves equivalent or higher sensitivity and spatial resolution in detecting TF binding sites with false discovery rate at a much lower level.

引用

页码：113 / 128

页数：16

共 28 条

[1] [Anonymous], 1980, CONT MATH
[2] MEME: discovering and analyzing DNA and protein sequence motifs
Bailey, Timothy L.
Williams, Nadya
Misleh, Chris
Li, Wilfred W.
[J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W369 - W373
[3] Combining evidence using p-values: application to sequence homology searches
Bailey, TL
Gribskov, M
[J]. BIOINFORMATICS, 1998, 14 (01) : 48 - 54
[4] High-resolution profiling of histone methylations in the human genome
Barski, Artern
Cuddapah, Suresh
Cui, Kairong
Roh, Tae-Young
Schones, Dustin E.
Wang, Zhibin
Wei, Gang
Chepelev, Iouri
Zhao, Keji
[J]. CELL, 2007, 129 (04) : 823 - 837
[5] Baxter R., 2007, Exactly Solved Models in Statistical Mechanics
[6] F-Seq: a feature density estimator for high-throughput sequence tags
Boyle, Alan P.
Guinney, Justin
Crawford, Gregory E.
Furey, Terrence S.
[J]. BIOINFORMATICS, 2008, 24 (21) : 2537 - 2538
[7] FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology
Fejes, Anthony P.
Robertson, Gordon
Bilenky, Mikhail
Varhol, Richard
Bainbridge, Matthew
Jones, Steven J. M.
[J]. BIOINFORMATICS, 2008, 24 (15) : 1729 - 1730
[8] A flexible and powerful bayesian hierarchical model for ChIP-chip experiments
Gottardo, Raphael
Li, Wei
Johnson, W. Evan
Liu, X. Shirley
[J]. BIOMETRICS, 2008, 64 (02) : 468 - 478
[9] SIMULATION RUN LENGTH CONTROL IN THE PRESENCE OF AN INITIAL TRANSIENT
HEIDELBERGER, P
WELCH, PD
[J]. OPERATIONS RESEARCH, 1983, 31 (06) : 1109 - 1144
[10] TileMap: create chromosomal map of tiling array hybridizations
Ji, HK
Wong, WH
[J]. BIOINFORMATICS, 2005, 21 (18) : 3629 - 3636

← 1 2 3 →