An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data

被引:15
作者
Jenkinson, Garrett [1 ,2 ]
Abante, Jordi [1 ]
Feinberg, Andrew P. [2 ,3 ,4 ]
Goutsias, John [1 ]
机构
[1] Johns Hopkins Univ, Whitaker Biomed Engn Inst, Baltimore, MD 21218 USA
[2] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[4] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
DNA methylation; Genome analysis; Information theory; Ising model; Methylation analysis; WGBS data modeling and analysis; DIFFERENTIALLY METHYLATED REGIONS; FALSE DISCOVERY RATE; DNA METHYLATION; CPG ISLANDS; OPTIMIZATION; POWERFUL; GENES;
D O I
10.1186/s12859-018-2086-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical Results: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. Conclusions: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Whole-genome bisulfite sequencing analysis of circulating tumour DNA for the detection and molecular classification of cancer
    Gao, Yibo
    Zhao, Hengqiang
    An, Ke
    Liu, Zongzhi
    Hai, Luo
    Li, Renda
    Zhou, Yang
    Zhao, Weipeng
    Jia, Yongsheng
    Wu, Nan
    Li, Lingyu
    Ying, Jianming
    Wang, Jie
    Xu, Binghe
    Wu, Zhihong
    Tong, Zhongsheng
    He, Jie
    Sun, Yingli
    CLINICAL AND TRANSLATIONAL MEDICINE, 2022, 12 (08):
  • [22] Software updates in the Illumina HiSeq platform affect whole-genome bisulfite sequencing
    Hidehiro Toh
    Kenjiro Shirane
    Fumihito Miura
    Naoki Kubo
    Kenji Ichiyanagi
    Katsuhiko Hayashi
    Mitinori Saitou
    Mikita Suyama
    Takashi Ito
    Hiroyuki Sasaki
    BMC Genomics, 18
  • [23] Software updates in the Illumina HiSeq platform affect whole-genome bisulfite sequencing
    Toh, Hidehiro
    Shirane, Kenjiro
    Miura, Fumihito
    Kubo, Naoki
    Ichiyanagi, Kenji
    Hayashi, Katsuhiko
    Saitou, Mitinori
    Suyama, Mikita
    Ito, Takashi
    Sasaki, Hiroyuki
    BMC GENOMICS, 2017, 18
  • [24] Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
    Wulfridge, Phillip
    Langmead, Ben
    Feinberg, Andrew P.
    Hansen, Kasper D.
    NUCLEIC ACIDS RESEARCH, 2019, 47 (19) : E117 - +
  • [25] Data quality of whole genome bisulfite sequencing on Illumina platforms
    Raine, Amanda
    Liljedahl, Ulrike
    Nordlund, Jessica
    PLOS ONE, 2018, 13 (04):
  • [26] epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data
    Vincent, Martin
    Mundbjerg, Kamilla
    Pedersen, Jakob Skou
    Liang, Gangning
    Jones, Peter A.
    Orntoft, Torben Falck
    Sorensen, Karina Dalsgaard
    Wiuf, Carsten
    GENOME BIOLOGY, 2017, 18
  • [27] Ultra-low-input, tagmentation-based whole-genome bisulfite sequencing
    Adey, Andrew
    Shendure, Jay
    GENOME RESEARCH, 2012, 22 (06) : 1139 - 1143
  • [28] Whole-genome bisulfite DNA sequencing of a DNMT3B mutant patient
    Heyn, Holger
    Vidal, Enrique
    Sayols, Sergi
    Sanchez-Mut, Jose V.
    Moran, Sebastian
    Medina, Ignacio
    Sandoval, Juan
    Simo-Riudalbas, Laia
    Szczesna, Karolina
    Huertas, Dori
    Gatto, Sole
    Matarazzo, Maria R.
    Dopazo, Joaquin
    Esteller, Manel
    EPIGENETICS, 2012, 7 (06) : 542 - 550
  • [29] Whole-genome bisulfite sequencing of goat skins identifies signatures associated with hair cycling
    Chao Li
    Yan Li
    Guangxian Zhou
    Ye Gao
    Sen Ma
    Yulin Chen
    Jiuzhou Song
    Xiaolong Wang
    BMC Genomics, 19
  • [30] Whole-genome bisulfite sequencing of goat skins identifies signatures associated with hair cycling
    Li, Chao
    Li, Yan
    Zhou, Guangxian
    Gao, Ye
    Ma, Sen
    Chen, Yulin
    Song, Jiuzhou
    Wang, Xiaolong
    BMC GENOMICS, 2018, 19