BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues

被引:36
作者
Zou, Luli S. [1 ]
Erdos, Michael R. [1 ]
Taylor, D. Leland [1 ,2 ]
Chines, Peter S. [1 ]
Varshney, Arushi [3 ]
Parker, Stephen C. J. [3 ,5 ]
Collins, Francis S. [1 ]
Didion, John P. [1 ]
机构
[1] NHGRI, NIH, Bethesda, MD 20892 USA
[2] Wellcome Genome Campus, European Bioinformat Inst, European Mol Biol Lab, Hinxton, Cambs, England
[3] Univ Michigan, Dept Human Genet, Ann Arbor, MI 48109 USA
[4] Washington Univ, Sch Med, St Louis, MO 63108 USA
[5] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
来源
BMC GENOMICS | 2018年 / 19卷
关键词
DNA methylation; XGBoost; Whole-genome bisulfite sequencing (WGBS); EPIC; Imputation; Adipose; Skeletal muscle; Pancreatic islets; GENOTYPE IMPUTATION; WIDE ASSOCIATION; BINDING; GENE; SUSCEPTIBILITY; IDENTIFICATION; SIGNATURE; ORIGIN;
D O I
10.1186/s12864-018-4766-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic features associated with predictive power. Results: Here we describe BoostMe, a method for imputing low-quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS of human tissues. Furthermore, we show that imputation improves concordance between WGBS and the MethylationEPIC array at low WGBS depth, suggesting improved WGBS accuracy after imputation. Conclusions: Our findings support the use of BoostMe as a preprocessing step for WGBS analysis.
引用
收藏
页数:15
相关论文
共 72 条
[1]   Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants [J].
Allum, Fiona ;
Shao, Xiaojian ;
Guenard, Frederic ;
Simon, Marie-Michelle ;
Busche, Stephan ;
Caron, Maxime ;
Lambourne, John ;
Lessard, Julie ;
Tandre, Karolina ;
Hedman, Asa K. ;
Kwan, Tony ;
Ge, Bing ;
Ronnblom, Lars ;
McCarthy, Mark I. ;
Deloukas, Panos ;
Richmond, Todd ;
Burgess, Daniel ;
Spector, Timothy D. ;
Tchernof, Andre ;
Marceau, Simon ;
Lathrop, Mark ;
Vohl, Marie-Claude ;
Pastinen, Tomi ;
Grundberg, Elin .
NATURE COMMUNICATIONS, 2015, 6
[2]  
Andrew S., 2010, FastQC
[3]   DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning [J].
Angermueller, Christof ;
Lee, Heather J. ;
Reik, Wolf ;
Stegle, Oliver .
GENOME BIOLOGY, 2017, 18
[4]  
[Anonymous], J MACH LEARN RES
[5]   Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays [J].
Aryee, Martin J. ;
Jaffe, Andrew E. ;
Corrada-Bravo, Hector ;
Ladd-Acosta, Christine ;
Feinberg, Andrew P. ;
Hansen, Kasper D. ;
Irizarry, Rafael A. .
BIOINFORMATICS, 2014, 30 (10) :1363-1369
[6]   DNA methylation patterns and epigenetic memory [J].
Bird, A .
GENES & DEVELOPMENT, 2002, 16 (01) :6-21
[7]  
Breiman L., 2001, Machine Learning, V45, P5
[8]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[9]   Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray [J].
Chen, Yi-an ;
Lemire, Mathieu ;
Choufani, Sanaa ;
Butcher, Darci T. ;
Grafodatskaya, Daria ;
Zanke, Brent W. ;
Gallinger, Steven ;
Hudson, Thomas J. ;
Weksberg, Rosanna .
EPIGENETICS, 2013, 8 (02) :203-209
[10]   Next-generation genotype imputation service and methods [J].
Das, Sayantan ;
Forer, Lukas ;
Schoenherr, Sebastian ;
Sidore, Carlo ;
Locke, Adam E. ;
Kwong, Alan ;
Vrieze, Scott I. ;
Chew, Emily Y. ;
Levy, Shawn ;
McGue, Matt ;
Schlessinger, David ;
Stambolian, Dwight ;
Loh, Po-Ru ;
Iacono, William G. ;
Swaroop, Anand ;
Scott, Laura J. ;
Cucca, Francesco ;
Kronenberg, Florian ;
Boehnke, Michael ;
Abecasis, Goncalo R. ;
Fuchsberger, Christian .
NATURE GENETICS, 2016, 48 (10) :1284-1287