EDeepSSP: Explainable deep neural networks for exact splice sites prediction

被引:15
作者
Amilpur, Santhosh [1 ]
Bhukya, Raju [1 ]
机构
[1] Natl Inst Technol Warangal, Comp Sci & Engn, Warangal 506004, Telangana, India
关键词
Splice site; EDeepSSP; convolutional neural network; motifs; feature activation; genome annotation; SIGNALS;
D O I
10.1142/S0219720020500249
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Splice site prediction is crucial for understanding underlying gene regulation, gene function for better genome annotation. Many computational methods exist for recognizing the splice sites. Although most of the methods achieve a competent performance, their interpretability remains challenging. Moreover, all traditional machine learning methods manually extract features, which is tedious job. To address these challenges, we propose a deep learning-based approach (EDeepSSP) that employs convolutional neural networks (CNNs) architecture for automatic feature extraction and effectively predicts splice sites. Our model, EDeepSSP, divulges the opaque nature of CNN by extracting significant motifs and explains why these motifs are vital for predicting splice sites. In this study, experiments have been conducted on six benchmark acceptors and donor datasets of humans, cress, and fly. The results show that EDeepSSP has outperformed many state-of-the-art approaches. EDeepSSP achieves the highest area under the receiver operating characteristic curve (AUC_ROC) and area under the precision-recall curve (AUC_PR) of 99.32% and 99.26% on human donor datasets, respectively. We also analyze various filter activities, feature activations, and extracted significant motifs responsible for the splice site prediction. Further, we validate the learned motifs of our model against known motifs of JASPAR splice site database.
引用
收藏
页数:18
相关论文
共 44 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]  
[Anonymous], ARXIV151205135
[3]  
[Anonymous], ARXIV190503554
[4]  
[Anonymous], 2013, Journal of Computational Information Systems
[5]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[6]   Splice site identification using probabilistic parameters and SVM classification [J].
Baten, A. K. M. A. ;
Chang, B. C. H. ;
Halgamuge, S. K. ;
Li, Jason .
BMC BIOINFORMATICS, 2006, 7 (Suppl 5)
[7]   Fast splice site detection using information content and feature reduction [J].
Baten, A. K. M. A. ;
Halgamuge, S. K. ;
Chang, B. C. H. .
BMC BIOINFORMATICS, 2008, 9 (Suppl 12)
[8]  
Benson DA, 2010, NUCLEIC ACIDS RES, V38, pD46, DOI [10.1093/nar/gkw1070, 10.1093/nar/gkp1024, 10.1093/nar/gkl986, 10.1093/nar/gkg057, 10.1093/nar/gks1195, 10.1093/nar/gkx1094, 10.1093/nar/gkn723, 10.1093/nar/gkq1079, 10.1093/nar/gkr1202]
[9]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[10]   Prediction of splice sites with dependency graphs and their expanded bayesian networks [J].
Chen, TM ;
Lu, CC ;
Li, WH .
BIOINFORMATICS, 2005, 21 (04) :471-482