UFold: fast and accurate RNA secondary structure prediction with deep learning

被引:91
|
作者
Fu, Laiyi [1 ,2 ]
Cao, Yingxin [2 ,5 ,6 ]
Wu, Jie [3 ]
Peng, Qinke [1 ]
Nie, Qing [4 ,5 ,6 ]
Xie, Xiaohui [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Syst Engn Inst, Xian 710049, Shaanxi, Peoples R China
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[3] Univ Calif Irvine, Dept Biol Chem, Irvine, CA 92697 USA
[4] Univ Calif Irvine, Dept Math, Irvine, CA 92697 USA
[5] Univ Calif Irvine, Ctr Complex Biol Syst, Irvine, CA 92697 USA
[6] Univ Calif Irvine, NSF Simons Ctr Multiscale Cell Fate Res, Irvine, CA 92697 USA
关键词
WEB SERVER; PROTEIN; DESIGN;
D O I
10.1093/nar/gkab1074
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at . Code is available at .
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Length-Dependent Deep Learning Model for RNA Secondary Structure Prediction
    Mao, Kangkun
    Wang, Jun
    Xiao, Yi
    MOLECULES, 2022, 27 (03):
  • [3] Sequence similarity governs generalizability of de novo deep learning models for RNA secondary structure prediction
    Qiu, Xiangyun
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (04)
  • [4] Rtips: fast and accurate tools for RNA 2D structure prediction using integer programming
    Kato, Yuki
    Sato, Kengo
    Asai, Kiyoshi
    Akutsu, Tatsuya
    NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) : W29 - W34
  • [5] RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning
    Singh, Jaswinder
    Hanson, Jack
    Paliwal, Kuldip
    Zhou, Yaoqi
    NATURE COMMUNICATIONS, 2019, 10 (1)
  • [6] Accurate RNA 3D structure prediction using a language model-based deep learning approach
    Shen, Tao
    Hu, Zhihang
    Sun, Siqi
    Liu, Di
    Wong, Felix
    Wang, Jiuming
    Chen, Jiayang
    Wang, Yixuan
    Hong, Liang
    Xiao, Jin
    Zheng, Liangzhen
    Krishnamoorthi, Tejas
    King, Irwin
    Wang, Sheng
    Yin, Peng
    Collins, James J.
    Li, Yu
    NATURE METHODS, 2024, : 2287 - 2298
  • [7] RNA Secondary Structure Prediction Using Soft Computing
    Ray, Shubhra Sankar
    Pal, Sankar K.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (01) : 2 - 17
  • [8] Explainable Deep Hypergraph Learning Modeling the Peptide Secondary Structure Prediction
    Jiang, Yi
    Wang, Ruheng
    Feng, Jiuxin
    Jin, Junru
    Liang, Sirui
    Li, Zhongshen
    Yu, Yingying
    Ma, Anjun
    Su, Ran
    Zou, Quan
    Ma, Qin
    Wei, Leyi
    ADVANCED SCIENCE, 2023, 10 (11)
  • [9] Computational Prediction of RNA Secondary Structure
    Moss, Walter N.
    LABORATORY METHODS IN ENZYMOLOGY: RNA, 2013, 530 : 3 - 65
  • [10] A Comparative Taxonomy of Parallel Algorithms for RNA Secondary Structure Prediction
    Al-Khatib, Ra'ed M.
    Abdullah, Rosni
    Rashid, Nur'Aini Abdul
    EVOLUTIONARY BIOINFORMATICS, 2010, 6 : 27 - 45