Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types

被引:7
作者
Bulla, Ingo [1 ,2 ]
Aliaga, Benoit [3 ]
Lacal, Virginia [4 ]
Bulla, Jan [4 ]
Grunau, Christoph [3 ]
Chaparro, Cristian [3 ]
机构
[1] Ernst Moritz Arndt Univ Greifswald, Inst Math & Informt, Walther Rathenau Str 47, D-17487 Greifswald, Germany
[2] Los Alamos Natl Lab, Theoret Biol & Biophys, Grp T6, Los Alamos, NM USA
[3] Univ Montpellier, Univ Perpignan, CNRS, IHPE UMR 5244,IFREMER, Via Domitia,58 Ave Paul Alduy, F-66860 Perpignan, France
[4] Univ Bergen, Dept Math, POB 7803, N-5020 Bergen, Norway
关键词
Epigenetics; DNA methylation; Kernel density estimation; CpG o/e ratio; CpN o/e ratio; GENOME-WIDE; CYTOSINE METHYLATION; GENE-EXPRESSION; DRAFT GENOME; EVOLUTION; METHYLOMES; INSIGHTS; HONEYBEE; ISLANDS; MODEL;
D O I
10.1186/s12859-018-2115-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG. Results: Predominantly model-based approaches essentially founded on mixtures of Gaussian distributions are currently used to investigate questions related to the number and position of modes of CpG o/e ratios. These approaches require the selection of an appropriate criterion for determining the best model and will fail if empirical distributions are complex or even merely moderately skewed. We use a kernel density estimation (KDE) based technique for robust and precise characterization of complex CpN o/e distributions without a priori assumptions about the underlying distributions. Conclusions: We show that KDE delivers robust descriptions of CpN o/e distributions. For straightforward processing, we have developed a Galaxy tool, called Notos and available at the ToolShed, that calculates these ratios of input FASTA files and fits a density to their empirical distribution. Based on the estimated density the number and shape of modes of the distribution is determined, providing a rational for the prediction of the number and the types of different methylation classes. Notos is written in R and Perl.
引用
收藏
页数:13
相关论文
共 82 条
[1]  
Adema CM, 2017, NATURE COMMUNICATION, V8, P1, DOI DOI 10.1038/
[2]   DNA methylation in amphioxus: from ancestral functions to new roles in vertebrates [J].
Albalat, Ricard ;
Marti-Solans, Josep ;
Canestro, Cristian .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2012, 11 (02) :142-155
[3]  
Aliaga B, 2018, SCI REP IN PRESS
[4]  
[Anonymous], 2017, R LANG ENV STAT COMP
[5]   Combining Mixture Components for Clustering [J].
Baudry, Jean-Patrick ;
Raftery, Adrian E. ;
Celeux, Gilles ;
Lo, Kenneth ;
Gottardo, Raphael .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2010, 19 (02) :332-353
[6]   Whole-Genome DNA Methylation Profile of the Jewel Wasp (Nasonia vitripennis) [J].
Beeler, Suzannah M. ;
Wong, Garrett T. ;
Zheng, Jennifer M. ;
Bush, Eliot C. ;
Remnant, Emily J. ;
Oldroyd, Benjamin P. ;
Drewell, Robert A. .
G3-GENES GENOMES GENETICS, 2014, 4 (03) :383-388
[7]  
Benaglia T, 2009, J STAT SOFTW, V32, P1
[8]   Evolution of DNA Methylation across Insects [J].
Bewick, Adam J. ;
Vogel, Kevin J. ;
Moore, Allen J. ;
Schmitz, Robert J. .
MOLECULAR BIOLOGY AND EVOLUTION, 2017, 34 (03) :654-665
[9]   DNA Demethylation Dynamics [J].
Bhutani, Nidhi ;
Burns, David M. ;
Blau, Helen M. .
CELL, 2011, 146 (06) :866-872
[10]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725