A Generalized Linear Model for Peak Calling in ChIP-Seq Data

被引:4
|
作者
Xu, Jialin [1 ]
Zhang, Yu [1 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
关键词
generalized linear model; ChIP-Seq; peak calling;
D O I
10.1089/cmb.2012.0023
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) has become a routine for detecting genome-wide protein-DNA interaction. The success of ChIP-Seq data analysis highly depends on the quality of peak calling (i.e., to detect peaks of tag counts at a genomic location and evaluate if the peak corresponds to a real protein-DNA interaction event). The challenges in peak calling include (1) how to combine the forward and the reverse strand tag data to improve the power of peak calling and (2) how to account for the variation of tag data observed across different genomic locations. We introduce a new peak calling method based on the generalized linear model (GLMNB) that utilizes negative binomial distribution to model the tag count data and account for the variation of background tags that may randomly bind to the DNA sequence at varying levels due to local genomic structures and sequence contents. We allow local shifting of peaks observed on the forward and the reverse stands, such that at each potential binding site, a binding profile representing the pattern of a real peak signal is fitted to best explain the observed tag data with maximum likelihood. Our method can also detect multiple peaks within a local region if there are multiple binding sites in the region.
引用
收藏
页码:826 / 838
页数:13
相关论文
共 50 条
  • [31] intePareto: an R package for integrative analyses of RNA-Seq and ChIP-Seq data
    Yingying Cao
    Simo Kitanovski
    Daniel Hoffmann
    BMC Genomics, 21
  • [32] intePareto: an R package for integrative analyses of RNA-Seq and ChIP-Seq data
    Cao, Yingying
    Kitanovski, Simo
    Hoffmann, Daniel
    BMC GENOMICS, 2020, 21 (Suppl 11)
  • [33] Spatio-temporal model for multiple ChIP-seq experiments
    Ranciati, Saverio
    Viroli, Cinzia
    Wit, Ernst
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2015, 14 (02) : 211 - 219
  • [34] The Sierra Platinum Service for generating peak-calls for replicated ChIP-seq experiments
    Wiegreffe D.
    Müller L.
    Steuck J.
    Zeckzer D.
    Stadler P.F.
    BMC Research Notes, 11 (1)
  • [35] Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data
    Ranciati, Saverio
    Viroli, Cinzia
    Wit, Ernst C.
    BIOMETRICAL JOURNAL, 2017, 59 (06) : 1301 - 1316
  • [36] Genomic Location Analysis by ChIP-Seq
    Barski, Artem
    Zhao, Keji
    JOURNAL OF CELLULAR BIOCHEMISTRY, 2009, 107 (01) : 11 - 18
  • [37] De novo ChIP-seq analysis
    Xin He
    A. Ercument Cicek
    Yuhao Wang
    Marcel H. Schulz
    Hai-Son Le
    Ziv Bar-Joseph
    Genome Biology, 16
  • [38] PICS: Probabilistic Inference for ChIP-seq
    Zhang, Xuekui
    Robertson, Gordon
    Krzywinski, Martin
    Ning, Kaida
    Droit, Arnaud
    Jones, Steven
    Gottardo, Raphael
    BIOMETRICS, 2011, 67 (01) : 151 - 163
  • [39] Detecting broad domains and narrow peaks in ChIP-seq data with hiddenDomains
    Joshua Starmer
    Terry Magnuson
    BMC Bioinformatics, 17
  • [40] MMDiff: quantitative testing for shape changes in ChIP-Seq data sets
    Schweikert, Gabriele
    Cseke, Botond
    Clouaire, Thomas
    Bird, Adrian
    Sanguinetti, Guido
    BMC GENOMICS, 2013, 14