Sequence-based heuristics for faster annotation of non-coding RNA families

被引:60
|
作者
Weinberg, Z [1 ]
Ruzzo, WL
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
关键词
D O I
10.1093/bioinformatics/bti743
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. Results: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that-unlike family-specific solutions-can scale to hundreds of ncRNA families.
引用
收藏
页码:35 / 39
页数:5
相关论文
共 50 条
  • [11] Identifying non-coding somatic cancer driver mutations using sequence-based models
    Urzua-Traslavina, Carlos
    van Lieshout, Tijs
    Barbadilla-Martinez, Lucia
    Klaassen, Noud
    Franceschini-Santos, Vinicius
    de Ridder, Jeroen
    van Steensel, Bas
    Franke, Lude
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1659 - 1659
  • [12] ncRNA-Agents: A Multiagent System for Non-coding RNA Annotation
    Arruda, Wosley
    Ralha, Celia G.
    Raiol, Taina
    Brigido, Marcelo M.
    Walter, Maria Emilia M. T.
    Stadler, Peter F.
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2013, 8213 : 136 - 147
  • [13] Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
    He, Wenying
    Ju, Ying
    Zeng, Xiangxiang
    Liu, Xiangrong
    Zou, Quan
    FRONTIERS IN MICROBIOLOGY, 2018, 9
  • [14] Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy
    Weinberg, Zasha
    Ruzzo, Walter L.
    BIOINFORMATICS, 2004, 20 : 334 - 341
  • [15] A non-coding RNA sequence alignment algorithm based on improved covariance model
    Liu X.
    Wang Y.
    Zhang N.
    International Journal Bioautomation, 2019, 23 (03) : 315 - 326
  • [16] Non-coding RNA gene families in the genomes of anopheline mosquitoes
    Vicky Dritsou
    Elena Deligianni
    Emmanuel Dialynas
    James Allen
    Nikos Poulakakis
    Christos Louis
    Dan Lawson
    Pantelis Topalis
    BMC Genomics, 15
  • [17] Non-coding RNA gene families in the genomes of anopheline mosquitoes
    Dritsou, Vicky
    Deligianni, Elena
    Dialynas, Emmanuel
    Allen, James
    Poulakakis, Nikos
    Louis, Christos
    Lawson, Dan
    Topalis, Pantelis
    BMC GENOMICS, 2014, 15
  • [18] Non-coding RNA
    Mattick, JS
    Makunin, IV
    HUMAN MOLECULAR GENETICS, 2006, 15 : R17 - R29
  • [19] Comparative analysis of protein-coding and long non-coding transcripts based on RNA sequence features
    Volkova, Oxana A.
    Kondrakhin, Yury V.
    Kashapov, Timur A.
    Sharipov, Ruslan N.
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2018, 16 (02)
  • [20] Sequence-based feature prediction and annotation of proteins
    Agnieszka S Juncker
    Lars J Jensen
    Andrea Pierleoni
    Andreas Bernsel
    Michael L Tress
    Peer Bork
    Gunnar von Heijne
    Alfonso Valencia
    Christos A Ouzounis
    Rita Casadio
    Søren Brunak
    Genome Biology, 10