Assessing Global-Local Secondary Structure Fingerprints to Classify RNA Sequences With Deep Learning

被引:3
作者
Sutanto, Kevin [1 ]
Turcotte, Marcel [1 ]
机构
[1] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
RNA classification; non-coding RNA; secondary structure; deep learning; k-mers; NONCODING RNAS; COMPUTATIONAL IDENTIFICATION; PREDICTION;
D O I
10.1109/TCBB.2021.3118358
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
RNA elements that are transcribed but not translated into proteins are called non-coding RNAs (ncRNAs). They play wide-ranging roles in biological processes and disorders. Just like proteins, their structure is often intimately linked to their function. Many examples have been documented where structure is conserved across taxa despite sequence divergence. Thus, structure is often used to identify function. Specifically, the secondary structure is predicted and ncRNAs with similar structures are assumed to have same or similar functions. However, a strand of RNA can fold into multiple possible structures, and some strands even fold differently in vivo and in vitro. Furthermore, ncRNAs often function as RNA-protein complexes, which can affect structure. Because of these, we hypothesized using one structure per sequence may discard information, possibly resulting in poorer classification accuracy. Therefore, we propose using secondary structure fingerprints, comprising two categories: a higher-level representation derived from RNA-As-Graphs (RAG), and free energy fingerprints based on a curated repertoire of small structural motifs. The fingerprints take into account the difference between global and local structural matches. We also evaluated our deep learning architecture with k-mers. By combining our global-local fingerprints with 6-mer, we achieved an accuracy, precision, and recall of 91.04%, 91.10%, and 91.00%.
引用
收藏
页码:2736 / 2747
页数:12
相关论文
共 58 条
  • [21] CD-HIT: accelerated for clustering the next-generation sequencing data
    Fu, Limin
    Niu, Beifang
    Zhu, Zhengwei
    Wu, Sitao
    Li, Weizhong
    [J]. BIOINFORMATICS, 2012, 28 (23) : 3150 - 3152
  • [22] Estimation of prediction error by using K-fold cross-validation
    Fushiki, Tadayoshi
    [J]. STATISTICS AND COMPUTING, 2011, 21 (02) : 137 - 146
  • [23] RAG: RNA-As-Graphs database - concepts, analysis, and features
    Gan, HH
    Fera, D
    Zorn, J
    Shiffeldrim, N
    Tang, M
    Laserson, U
    Kim, N
    Schlick, T
    [J]. BIOINFORMATICS, 2004, 20 (08) : 1285 - 1291
  • [24] A comprehensive review of non-coding RNAs functions in multiple sclerosis
    Ghafouri-Fard, Soudeh
    Taheri, Mohammad
    [J]. EUROPEAN JOURNAL OF PHARMACOLOGY, 2020, 879
  • [25] Genome classification improvements based on k-mer intervals in sequences
    Han, Gyu-Bum
    Cho, Dong-Ho
    [J]. GENOMICS, 2019, 111 (06) : 1574 - 1582
  • [26] A Survey of Deep Learning: Platforms, Applications and Emerging Rlesearch Trends
    Hatcher, William Grant
    Yu, Wei
    [J]. IEEE ACCESS, 2018, 6 : 24411 - 24432
  • [27] RNA CHAPERONES AND THE RNA FOLDING PROBLEM
    HERSCHLAG, D
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 1995, 270 (36) : 20871 - 20874
  • [28] SnoReport: computational identification of snoRNAs with unknown targets
    Hertel, Jana
    Hofacker, Ivo L.
    Stadler, Peter F.
    [J]. BIOINFORMATICS, 2008, 24 (02) : 158 - 164
  • [29] Preclinical and Clinical Development of Noncoding RNA Therapeutics for Cardiovascular Disease
    Huang, Cheng-Kai
    Kafert-Kasting, Sabine
    Thum, Thomas
    [J]. CIRCULATION RESEARCH, 2020, 126 (05) : 663 - 678
  • [30] Accurate Classification of RNA Structures Using Topological Fingerprints
    Huang, Jiajie
    Li, Kejie
    Gribskov, Michael
    [J]. PLOS ONE, 2016, 11 (10):