DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

被引:554
作者
Quang, Daniel [1 ,2 ]
Xie, Xiaohui [1 ,2 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Ctr Complex Biol Syst, Irvine, CA 92697 USA
基金
美国国家科学基金会;
关键词
GENOME-WIDE ASSOCIATION; VARIANTS;
D O I
10.1093/nar/gkw226
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models.
引用
收藏
页数:6
相关论文
共 28 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]  
[Anonymous], 2014, P 2014 C EMP METH NA, DOI 10.3115/v1/D14-1003
[3]  
[Anonymous], 2010, P PYTH SCI C
[4]  
[Anonymous], 2012, DEEP LEARNING UNSUPE
[5]  
[Anonymous], 2013, INT C MACHINE LEARNI
[6]  
[Anonymous], 2016, 30 AAAI C ART INT AA
[7]  
[Anonymous], 2012, Nature
[8]   Gene expression inference with deep learning [J].
Chen, Yifei ;
Li, Yi ;
Narayan, Rajiv ;
Subramanian, Aravind ;
Xie, Xiaohui .
BIOINFORMATICS, 2016, 32 (12) :1832-1839
[9]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[10]   Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features [J].
Ghandi, Mahmoud ;
Lee, Dongwon ;
Mohammad-Noori, Morteza ;
Beer, Michael A. .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (07)