A Generalized Linear Model for Peak Calling in ChIP-Seq Data

被引:4
|
作者
Xu, Jialin [1 ]
Zhang, Yu [1 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
关键词
generalized linear model; ChIP-Seq; peak calling;
D O I
10.1089/cmb.2012.0023
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) has become a routine for detecting genome-wide protein-DNA interaction. The success of ChIP-Seq data analysis highly depends on the quality of peak calling (i.e., to detect peaks of tag counts at a genomic location and evaluate if the peak corresponds to a real protein-DNA interaction event). The challenges in peak calling include (1) how to combine the forward and the reverse strand tag data to improve the power of peak calling and (2) how to account for the variation of tag data observed across different genomic locations. We introduce a new peak calling method based on the generalized linear model (GLMNB) that utilizes negative binomial distribution to model the tag count data and account for the variation of background tags that may randomly bind to the DNA sequence at varying levels due to local genomic structures and sequence contents. We allow local shifting of peaks observed on the forward and the reverse stands, such that at each potential binding site, a binding profile representing the pattern of a real peak signal is fitted to best explain the observed tag data with maximum likelihood. Our method can also detect multiple peaks within a local region if there are multiple binding sites in the region.
引用
收藏
页码:826 / 838
页数:13
相关论文
共 50 条
  • [41] Systematic discovery of cofactor motifs from ChIP-seq data by SIOMICS
    Ding, Jun
    Dhillon, Vikram
    Li, Xiaoman
    Hu, Haiyan
    METHODS, 2015, 79-80 : 47 - 51
  • [42] An automated analysis pipeline for a large set of ChIP-seq data: AutoChIP
    Kim, Taemook
    Lee, Wooseok
    Han, Kyudong
    Kang, Keunsoo
    GENES & GENOMICS, 2015, 37 (03) : 305 - 311
  • [43] An automated analysis pipeline for a large set of ChIP-seq data: AutoChIP
    Taemook Kim
    Wooseok Lee
    Kyudong Han
    Keunsoo Kang
    Genes & Genomics, 2015, 37 : 305 - 311
  • [44] Large-Scale Quality Analysis of Published ChIP-seq Data
    Marinov, Georgi K.
    Kundaje, Anshul
    Park, Peter J.
    Wold, Barbara J.
    G3-GENES GENOMES GENETICS, 2014, 4 (02): : 209 - 223
  • [45] Analysis of Gene Regulatory Networks Inferred from ChIP-seq Data
    Stamoulakatou, Eirini
    Piccardi, Carlo
    Masseroli, Marco
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2019, PT I, 2019, 11465 : 319 - 331
  • [46] MMDiff: quantitative testing for shape changes in ChIP-Seq data sets
    Gabriele Schweikert
    Botond Cseke
    Thomas Clouaire
    Adrian Bird
    Guido Sanguinetti
    BMC Genomics, 14
  • [47] Detecting broad domains and narrow peaks in ChIP-seq data with hiddenDomains
    Starmer, Joshua
    Magnuson, Terry
    BMC BIOINFORMATICS, 2016, 17
  • [48] A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data
    Tran, Ngoc Tam L.
    Huang, Chun-Hsi
    BIOLOGY DIRECT, 2014, 9
  • [49] RECOGNICER: A coarse-graining approach for identifying broad domains from ChIP-seq data
    Zang, Chongzhi
    Wang, Yiren
    Peng, Weiqun
    QUANTITATIVE BIOLOGY, 2020, 8 (04) : 359 - 368
  • [50] A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data
    Ngoc Tam L Tran
    Chun-Hsi Huang
    Biology Direct, 9