Enhanced prediction of RNA solvent accessibility with long short-term memory neural networks and improved sequence profiles

被引:29
作者
Sun, Saisai [1 ]
Wu, Qi [1 ]
Peng, Zhenling [2 ]
Yang, Jianyi [1 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin 300071, Peoples R China
[2] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
PROTEIN; ALIGNMENT; SECONDARY;
D O I
10.1093/bioinformatics/bty876
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The de novo prediction of RNA tertiary structure remains a grand challenge. Predicted RNA solvent accessibility provides an opportunity to address this challenge. To the best of our knowledge, there is only one method (RNAsnap) available for RNA solvent accessibility prediction. However, its performance is unsatisfactory for protein-free RNAs. Results We developed RNAsol, a new algorithm to predict RNA solvent accessibility. RNAsol was built based on improved sequence profiles from the covariance models and trained with the long short-term memory (LSTM) neural networks. Independent tests on the same datasets from RNAsnap show that RNAsol achieves the mean Pearson's correlation coefficient (PCC) of 0.43/0.26 for the protein-bound/protein-free RNA molecules, which is 26.5%/136.4% higher than that of RNAsnap. When the training set is enlarged to include both types of RNAs, the PCCs increase to 0.49 and 0.46 for protein-bound and protein-free RNAs, respectively. The success of RNAsol is attributed to two aspects, including the improved sequence profiles constructed by the sequence-profile alignment and the enhanced training by the LSTM neural networks. Availability and implementation http://yanglab.nankai.edu.cn/RNAsol/ Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:1686 / 1691
页数:6
相关论文
共 30 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Topological constraints: using RNA secondary structure to model 3D conformation, folding pathways, and dynamic adaptation [J].
Bailor, Maximillian H. ;
Mustoe, Anthony M. ;
Brooks, Charles L., III ;
Al-Hashimi, Hashim M. .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2011, 21 (03) :296-305
[3]   POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level [J].
Cavallo, L ;
Kleinjung, J ;
Fraternali, F .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3364-3366
[4]   Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction [J].
De Leonardis, Eleonora ;
Lutz, Benjamin ;
Ratz, Sebastian ;
Cocco, Simona ;
Monasson, Remi ;
Schug, Alexander ;
Weigt, Martin .
NUCLEIC ACIDS RESEARCH, 2015, 43 (21) :10444-10455
[5]  
Ding F, 2012, NAT METHODS, V9, P603, DOI [10.1038/NMETH.1976, 10.1038/nmeth.1976]
[6]  
Eddy Sean R, 2009, Genome Inform, V23, P205
[7]   COACH:: profile-profile alignment of protein families using hidden Markov models [J].
Edgar, RC ;
Sjölander, K .
BIOINFORMATICS, 2004, 20 (08) :1309-1318
[8]   On the significance of an RNA tertiary structure prediction [J].
Hajdin, Christine E. ;
Ding, Feng ;
Dokholyan, Nikolay V. ;
Weeks, Kevin M. .
RNA, 2010, 16 (07) :1340-1349
[9]   Hidden Markov models for detecting remote protein homologies [J].
Karplus, K ;
Barrett, C ;
Hughey, R .
BIOINFORMATICS, 1998, 14 (10) :846-856
[10]  
King DB, 2015, ACS SYM SER, V1214, P1