Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure

被引:21
作者
Deng, Lei [1 ]
Liu, Youzhi [1 ]
Shi, Yechuan [1 ]
Zhang, Wenhao [2 ]
Yang, Chun [3 ]
Liu, Hui [2 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410075, Peoples R China
[2] Changzhou Univ, Aliyun Sch Big Data, Changzhou 213164, Jiangsu, Peoples R China
[3] Nanjing Med Univ, Affiliated Changzhou Peoples Hosp 2, Dept Obstet, Changzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
RNA-binding proteins; Binding sites; Distributed representation; k-mer; Deep learning; Convolutional neural network; Bidirectional long short term memory network; PREDICTION; IDENTIFICATION; SPECIFICITIES; RECOGNITION; MOTIFS;
D O I
10.1186/s12864-020-07239-w
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundRNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences.ResultsIn this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets.ConclusionsOur extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/.
引用
收藏
页数:10
相关论文
共 47 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]   doRiNA: a database of RNA interactions in post-transcriptional regulation [J].
Anders, Gerd ;
Mackowiak, Sebastian D. ;
Jens, Marvin ;
Maaskola, Jonas ;
Kuntzagk, Andreas ;
Rajewsky, Nikolaus ;
Landthaler, Markus ;
Dieterich, Christoph .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D180-D186
[3]  
[Anonymous], 2015, ADV NEURAL INFORM PR
[4]  
[Anonymous], 2009, P 26 ANN INT C MACH
[5]  
[Anonymous], 2014, P IEEE C COMP VIS PA
[6]  
[Anonymous], ARXIV
[7]   Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [J].
Asgari, Ehsaneddin ;
Mofrad, Mohammad R. K. .
PLOS ONE, 2015, 10 (11)
[8]   A deep neural network approach for learning intrinsic protein-RNA binding preferences [J].
Ben-Bassat, Ilan ;
Chor, Benny ;
Orenstein, Yaron .
BIOINFORMATICS, 2018, 34 (17) :638-646
[9]   pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks [J].
Budach, Stefan ;
Marsico, Annalisa .
BIOINFORMATICS, 2018, 34 (17) :3035-3037
[10]   RNAcommender: genome-wide recommendation of RNA-protein interactions [J].
Corrado, Gianluca ;
Tebaldi, Toma ;
Costa, Fabrizio ;
Frasconi, Paolo ;
Passerini, Andrea .
BIOINFORMATICS, 2016, 32 (23) :3627-3634