Predicting enhancers in mammalian genomes using supervised hidden Markov models

被引:10
作者
Zehnder, Tobias [1 ]
Benner, Philipp [1 ]
Vingron, Martin [1 ]
机构
[1] Max Planck Inst Mol Genet, Ihnestr 63-73, D-14195 Berlin, Germany
关键词
Enhancer prediction; Epigenetics; Gene regulation; Supervised hidden Markov models; CHROMATIN-STRUCTURE; DNA METHYLATION; ELEMENTS; REVEALS; WIDE; TRANSCRIPTION; ANNOTATION; EXPRESSION; PROMOTERS; DISCOVERY;
D O I
10.1186/s12859-019-2708-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundEukaryotic gene regulation is a complex process comprising the dynamic interaction of enhancers and promoters in order to activate gene expression. In recent years, research in regulatory genomics has contributed to a better understanding of the characteristics of promoter elements and for most sequenced model organism genomes there exist comprehensive and reliable promoter annotations. For enhancers, however, a reliable description of their characteristics and location has so far proven to be elusive. With the development of high-throughput methods such as ChIP-seq, large amounts of data about epigenetic conditions have become available, and many existing methods use the information on chromatin accessibility or histone modifications to train classifiers in order to segment the genome into functional groups such as enhancers and promoters. However, these methods often do not consider prior biological knowledge about enhancers such as their diverse lengths or molecular structure.ResultsWe developed enhancer HMM (eHMM), a supervised hidden Markov model designed to learn the molecular structure of promoters and enhancers. Both consist of a central stretch of accessible DNA flanked by nucleosomes with distinct histone modification patterns. We evaluated the performance of eHMM within and across cell types and developmental stages and found that eHMM successfully predicts enhancers with high precision and recall comparable to state-of-the-art methods, and consistently outperforms those in terms of accuracy and resolution.ConclusionseHMM predicts active enhancers based on data from chromatin accessibility assays and a minimal set of histone modification ChIP-seq experiments. In comparison to other 'black box' methods its parameters are easy to interpret. eHMM can be used as a stand-alone tool for enhancer prediction without the need for additional training or a tuning of parameters. The high spatial precision of enhancer predictions gives valuable targets for potential knockout experiments or downstream analyses such as motif search.
引用
收藏
页数:12
相关论文
共 59 条
[31]   Progress and challenges in bioinformatics approaches for enhancer identification [J].
Kleftogiannis, Dimitrios ;
Kalnis, Panos ;
Bajic, Vladimir B. .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (06) :967-979
[32]   Efficient algorithms for training the parameters of hidden Markov models using stochastic expectation maximization (EM) training and Viterbi training [J].
Lam, Tin Y. ;
Meyer, Irmtraud M. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5
[33]   Genome-wide Studies of CCCTC-binding Factor (CTCF) and Cohesin Provide Insight into Chromatin Structure and Regulation [J].
Lee, Bum-Kyu ;
Iyer, Vishwanath R. .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2012, 287 (37) :30906-30913
[34]   The Sequence Read Archive [J].
Leinonen, Rasko ;
Sugawara, Hideaki ;
Shumway, Martin .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D19-D21
[35]   A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly [J].
Lettice, LA ;
Heaney, SJH ;
Purdie, LA ;
Li, L ;
de Beer, P ;
Oostra, BA ;
Goode, D ;
Elgar, G ;
Hill, RE ;
de Graaff, E .
HUMAN MOLECULAR GENETICS, 2003, 12 (14) :1725-1735
[36]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[37]   A survey of recently emerged genome-wide computational enhancer predictor tools [J].
Lim, Leonard Whye Kit ;
Chung, Hung Hui ;
Chong, Yee Ling ;
Lee, Nung Kion .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 74 :132-141
[38]   Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome [J].
Mammana, Alessandro ;
Chung, Ho-Ryun .
GENOME BIOLOGY, 2015, 16
[39]  
Mammana Alessandro., 2016, bamsignals: Extract read count signals from bam files
[40]   Genome-scale DNA methylation maps of pluripotent and differentiated cells [J].
Meissner, Alexander ;
Mikkelsen, Tarjei S. ;
Gu, Hongcang ;
Wernig, Marius ;
Hanna, Jacob ;
Sivachenko, Andrey ;
Zhang, Xiaolan ;
Bernstein, Bradley E. ;
Nusbaum, Chad ;
Jaffe, David B. ;
Gnirke, Andreas ;
Jaenisch, Rudolf ;
Lander, Eric S. .
NATURE, 2008, 454 (7205) :766-U91