Genome-wide prediction of cis-regulatory regions using supervised deep learning methods

被引:74
作者
Li, Yifeng [1 ,2 ]
Shi, Wenqiang [1 ]
Wasserman, Wyeth W. [1 ]
机构
[1] Univ British Columbia, Dept Med Genet, BC Childrens Hosp, Ctr Mol Med & Therapeut,Res Inst, Rm 3109,950 West 28th Ave, Vancouver, BC V5Z 4H4, Canada
[2] Natl Res Council Canada, Digital Technol Res Ctr, Bldg M-50,1200 Montreal Rd, Ottawa, ON K1A 0R6, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会; 加拿大创新基金会; 美国国家卫生研究院;
关键词
cis-regulatory region; Enhancer; Promoter; Deep learning; TRANSCRIPTION FACTORS; UNIFIED ARCHITECTURE; ENHANCERS; ELEMENTS; IDENTIFICATION; DNA; DISSECTION; INITIATION; PROMOTERS; DISCOVERY;
D O I
10.1186/s12859-018-2187-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Results: Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). Conclusion: The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
引用
收藏
页数:14
相关论文
共 67 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]   A role for AP-1 in apoptosis: the case for and against [J].
Ameyar, M ;
Wisniewska, M ;
Weitzman, JB .
BIOCHIMIE, 2003, 85 (08) :747-752
[3]   A unified architecture of transcriptional regulatory elements [J].
Andersson, Robin ;
Sandelin, Albin ;
Danko, Charles G. .
TRENDS IN GENETICS, 2015, 31 (08) :426-433
[4]   An atlas of active enhancers across human cell types and tissues [J].
Andersson, Robin ;
Gebhard, Claudia ;
Miguel-Escalada, Irene ;
Hoof, Ilka ;
Bornholdt, Jette ;
Boyd, Mette ;
Chen, Yun ;
Zhao, Xiaobei ;
Schmidl, Christian ;
Suzuki, Takahiro ;
Ntini, Evgenia ;
Arner, Erik ;
Valen, Eivind ;
Li, Kang ;
Schwarzfischer, Lucia ;
Glatz, Dagmar ;
Raithel, Johanna ;
Lilje, Berit ;
Rapin, Nicolas ;
Bagger, Frederik Otzen ;
Jorgensen, Mette ;
Andersen, Peter Refsing ;
Bertin, Nicolas ;
Rackham, Owen ;
Burroughs, A. Maxwell ;
Baillie, J. Kenneth ;
Ishizu, Yuri ;
Shimizu, Yuri ;
Furuhata, Erina ;
Maeda, Shiori ;
Negishi, Yutaka ;
Mungall, Christopher J. ;
Meehan, Terrence F. ;
Lassmann, Timo ;
Itoh, Masayoshi ;
Kawaji, Hideya ;
Kondo, Naoto ;
Kawai, Jun ;
Lennartsson, Andreas ;
Daub, Carsten O. ;
Heutink, Peter ;
Hume, David A. ;
Jensen, Torben Heick ;
Suzuki, Harukazu ;
Hayashizaki, Yoshihide ;
Mueller, Ferenc ;
Forrest, Alistair R. R. ;
Carninci, Piero ;
Rehli, Michael ;
Sandelin, Albin .
NATURE, 2014, 507 (7493) :455-+
[5]  
[Anonymous], 2016, BIORXIV, DOI DOI 10.1101/081380
[6]   Expression of CAAT enhancer binding protein beta (C/EBP β) in cervix and endometrium [J].
Brenda Arnett ;
Patrick Soisson ;
Barbara S Ducatman ;
Peilin Zhang .
Molecular Cancer, 2 (1)
[7]   Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq [J].
Arnold, Cosmas D. ;
Gerlach, Daniel ;
Stelzer, Christoph ;
Boryn, Lukasz M. ;
Rath, Martina ;
Stark, Alexander .
SCIENCE, 2013, 339 (6123) :1074-1077
[8]   On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation [J].
Bach, Sebastian ;
Binder, Alexander ;
Montavon, Gregoire ;
Klauschen, Frederick ;
Mueller, Klaus-Robert ;
Samek, Wojciech .
PLOS ONE, 2015, 10 (07)
[9]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32