Human Splice-Site Prediction with Deep Neural Networks

被引:14
作者
Naito, Tatsuhiko [1 ]
机构
[1] Univ Tokyo, Grad Sch Med, Dept Neurol, Tokyo, Japan
关键词
deep learning; deep neural networks; splice-site prediction; splicing;
D O I
10.1089/cmb.2018.0041
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Accurate splice-site prediction is essential to delineate gene structures from sequence data. Several computational techniques have been applied to create a system to predict canonical splice sites. For classification tasks, deep neural networks (DNNs) have achieved record-breaking results and often outperformed other supervised learning techniques. In this study, a new method of splice-site prediction using DNNs was proposed. The proposed system receives an input sequence data and returns an answer as to whether it is splice site. The length of input is 140 nucleotides, with the consensus sequence (i.e., GT and AG for the donor and acceptor sites, respectively) in the middle. Each input sequence model is applied to the pretrained DNN model that determines the probability that an input is a splice site. The model consists of convolutional layers and bidirectional long short-term memory network layers. The pretraining and validation were conducted using the data set tested in previously reported methods. The performance evaluation results showed that the proposed method can outperform the previous methods. In addition, the pattern learned by the DNNs was visualized as position frequency matrices (PFMs). Some of PFMs were very similar to the consensus sequence. The trained DNN model and the brief source code for the prediction system are uploaded. Further improvement will be achieved following the further development of DNNs.
引用
收藏
页码:954 / 961
页数:8
相关论文
共 22 条
[1]  
[Anonymous], 2012, Int. J. Eng. Trends Technol
[2]   Splice site identification using probabilistic parameters and SVM classification [J].
Baten, A. K. M. A. ;
Chang, B. C. H. ;
Halgamuge, S. K. ;
Li, Jason .
BMC BIOINFORMATICS, 2006, 7 (Suppl 5)
[3]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[4]   Analysis of canonical and non-canonical splice sites in mammalian genomes [J].
Burset, M ;
Seledtsov, IA ;
Solovyev, VV .
NUCLEIC ACIDS RESEARCH, 2000, 28 (21) :4364-4375
[5]   Prediction of splice sites with dependency graphs and their expanded bayesian networks [J].
Chen, TM ;
Lu, CC ;
Li, WH .
BIOINFORMATICS, 2005, 21 (04) :471-482
[6]   SpliceMachine:: predicting splice sites from high-dimensional local context representations [J].
Degroeve, S ;
Saeys, Y ;
De Baets, B ;
Rouzé, P ;
Van de Peer, Y .
BIOINFORMATICS, 2005, 21 (08) :1332-1338
[7]   Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].
Graves, A ;
Schmidhuber, J .
NEURAL NETWORKS, 2005, 18 (5-6) :602-610
[8]  
Kingma D. P., P 3 INT C LEARN REPR
[9]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[10]   Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features [J].
Meher, Prabina Kumar ;
Sahu, Tanmaya Kumar ;
Rao, A. R. ;
Wahi, S. D. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2016, 11