CrepHAN: cross-species prediction of enhancers by using hierarchical attention networks

被引:11
作者
Hong, Jianwei [1 ,2 ]
Gao, Ruitian [3 ]
Yang, Yang [1 ,4 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Agr & Biol, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Bioinformat & Biostat, Shanghai 200240, Peoples R China
[4] Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
CIS-REGULATORY MODULES; CHROMATIN SIGNATURES; MAMMALIAN ENHANCERS; ELEMENTS; DATABASE; MODEL;
D O I
10.1093/bioinformatics/btab349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Enhancers are important functional elements in genome sequences. The identification of enhancers is a very challenging task due to the great diversity of enhancer sequences and the flexible localization on genomes. Till now, the interactions between enhancers and genes have not been fully understood yet. To speed up the studies of the regulatory roles of enhancers, computational tools for the prediction of enhancers have emerged in recent years. Especially, thanks to the ENCODE project and the advances of high-throughput experimental techniques, a large amount of experimentally verified enhancers have been annotated on the human genome, which allows large-scale predictions of unknown enhancers using data-driven methods. However, except for human and some model organisms, the validated enhancer annotations are scarce for most species, leading to more difficulties in the computational identification of enhancers for their genomes. Results: In this study, we propose a deep learning-based predictor for enhancers, named CrepHAN, which is featured by a hierarchical attention neural network and word embedding-based representations for DNA sequences. We use the experimentally supported data of the human genome to train the model, and perform experiments on human and other mammals, including mouse, cow and dog. The experimental results show that CrepHAN has more advantages on cross-species predictions, and outperforms the existing models by a large margin. Especially, for human-mouse cross-predictions, the area under the receiver operating characteristic (ROC) curve (AUC) score of ROC curve is increased by 0.033 similar to 0.145 on the combined tissue dataset and 0.032 similar to 0.109 on tissue-specific datasets.
引用
收藏
页码:3436 / 3443
页数:8
相关论文
共 48 条
[31]   Transcriptional regulatory elements in the human genome [J].
Maston, Glenn A. ;
Evans, Sara K. ;
Green, Michael R. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2006, 7 :29-59
[32]   JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles [J].
Mathelier, Anthony ;
Zhao, Xiaobei ;
Zhang, Allen W. ;
Parcy, Francois ;
Worsley-Hunt, Rebecca ;
Arenillas, David J. ;
Buchman, Sorana ;
Chen, Chih-yu ;
Chou, Alice ;
Ienasescu, Hans ;
Lim, Jonathan ;
Shyr, Casper ;
Tan, Ge ;
Zhou, Michelle ;
Lenhard, Boris ;
Sandelin, Albin ;
Wasserman, Wyeth W. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D142-D147
[33]   Predicting enhancers with deep convolutional neural networks [J].
Min, Xu ;
Zeng, Wanwen ;
Chen, Shengquan ;
Chen, Ning ;
Chen, Ting ;
Jiang, Rui .
BMC BIOINFORMATICS, 2017, 18
[34]   Enhancers: five essential questions [J].
Pennacchio, Len A. ;
Bickmore, Wendy ;
Dean, Ann ;
Nobrega, Marcelo A. ;
Bejerano, Gill .
NATURE REVIEWS GENETICS, 2013, 14 (04) :288-295
[35]  
Pennington Jeffrey, 2014, P 2014 C EMP METH NA, P1532, DOI DOI 10.3115/V1/D14-1162
[36]   Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements [J].
Rye, Morten ;
Saetrom, Pal ;
Handstad, Tony ;
Drablos, Finn .
BMC BIOLOGY, 2011, 9
[37]   Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila [J].
Sinha, S ;
Schroeder, MD ;
Unnerstall, U ;
Gaul, U ;
Siggia, ED .
BMC BIOINFORMATICS, 2004, 5 (1)
[38]   Enhancer biology and enhanceropathies [J].
Smith, Edwin ;
Shilatifard, Ali .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2014, 21 (03) :210-219
[39]   The role of enhancers in cancer [J].
Sur, Inderpreet ;
Taipale, Jussi .
NATURE REVIEWS CANCER, 2016, 16 (08) :483-493
[40]   Identifying transcriptional cis-regulatory modules in animal genomes [J].
Suryamohan, Kushal ;
Halfon, Marc S. .
WILEY INTERDISCIPLINARY REVIEWS-DEVELOPMENTAL BIOLOGY, 2015, 4 (02) :59-84