CrepHAN: cross-species prediction of enhancers by using hierarchical attention networks

被引:11
作者
Hong, Jianwei [1 ,2 ]
Gao, Ruitian [3 ]
Yang, Yang [1 ,4 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Agr & Biol, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Bioinformat & Biostat, Shanghai 200240, Peoples R China
[4] Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
CIS-REGULATORY MODULES; CHROMATIN SIGNATURES; MAMMALIAN ENHANCERS; ELEMENTS; DATABASE; MODEL;
D O I
10.1093/bioinformatics/btab349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Enhancers are important functional elements in genome sequences. The identification of enhancers is a very challenging task due to the great diversity of enhancer sequences and the flexible localization on genomes. Till now, the interactions between enhancers and genes have not been fully understood yet. To speed up the studies of the regulatory roles of enhancers, computational tools for the prediction of enhancers have emerged in recent years. Especially, thanks to the ENCODE project and the advances of high-throughput experimental techniques, a large amount of experimentally verified enhancers have been annotated on the human genome, which allows large-scale predictions of unknown enhancers using data-driven methods. However, except for human and some model organisms, the validated enhancer annotations are scarce for most species, leading to more difficulties in the computational identification of enhancers for their genomes. Results: In this study, we propose a deep learning-based predictor for enhancers, named CrepHAN, which is featured by a hierarchical attention neural network and word embedding-based representations for DNA sequences. We use the experimentally supported data of the human genome to train the model, and perform experiments on human and other mammals, including mouse, cow and dog. The experimental results show that CrepHAN has more advantages on cross-species predictions, and outperforms the existing models by a large margin. Especially, for human-mouse cross-predictions, the area under the receiver operating characteristic (ROC) curve (AUC) score of ROC curve is increased by 0.033 similar to 0.145 on the combined tissue dataset and 0.032 similar to 0.109 on tissue-specific datasets.
引用
收藏
页码:3436 / 3443
页数:8
相关论文
共 48 条
[1]   Long short-term memory [J].
Hochreiter, S ;
Schmidhuber, J .
NEURAL COMPUTATION, 1997, 9 (08) :1735-1780
[2]   BATF3-dependent dendritic cells drive both effector and regulatory T-cell responses in bacterially infected tissues [J].
Arnold, Isabelle C. ;
Zhang, Xiaozhou ;
Artola-Boran, Mariela ;
Fallegger, Angela ;
Sander, Peter ;
Johansen, Pal ;
Muller, Anne .
PLOS PATHOGENS, 2019, 15 (06)
[3]   An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes [J].
Arunachalam, Manonmani ;
Jayasurya, Karthik ;
Tomancak, Pavel ;
Ohler, Uwe .
BIOINFORMATICS, 2010, 26 (17) :2109-2115
[4]   Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [J].
Asgari, Ehsaneddin ;
Mofrad, Mohammad R. K. .
PLOS ONE, 2015, 10 (11)
[5]   Computational enhancer prediction: evaluation and improvements [J].
Asma, Hasiba ;
Halfon, Marc S. .
BMC BIOINFORMATICS, 2019, 20 (1)
[6]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[7]   Transcription Factor CREM Mediates High Glucose Response in Cardiomyocytes and in a Male Mouse Model of Prolonged Hyperglycemia [J].
Barbati, Saviana A. ;
Colussi, Claudia ;
Bacci, Lorenza ;
Aiello, Aurora ;
Re, Agnese ;
Stigliano, Egidio ;
Isidori, Andrea M. ;
Grassi, Claudio ;
Pontecorvi, Alfredo ;
Farsetti, Antonella ;
Gaetano, Carlo ;
Nanni, Simona .
ENDOCRINOLOGY, 2017, 158 (07) :2391-2405
[8]   Going the distance: A current view of enhancer action [J].
Blackwood, EM ;
Kadonaga, JT .
SCIENCE, 1998, 281 (5373) :60-63
[9]   High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells [J].
Boyle, Alan P. ;
Song, Lingyun ;
Lee, Bum-Kyu ;
London, Darin ;
Keefe, Damian ;
Birney, Ewan ;
Iyer, Vishwanath R. ;
Crawford, Gregory E. ;
Furey, Terrence S. .
GENOME RESEARCH, 2011, 21 (03) :456-464
[10]   A new method for enhancer prediction based on deep belief network [J].
Bu, Hongda ;
Gan, Yanglan ;
Wang, Yang ;
Zhou, Shuigeng ;
Guan, Jihong .
BMC BIOINFORMATICS, 2017, 18