Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering

被引:355
作者
Will, Sebastian
Reiche, Kristin
Hofacker, Ivo L.
Stadler, Peter F.
Backofen, Rolf [1 ]
机构
[1] Univ Freiburg, Inst Comp Sci, Bioinformat Grp, Freiburg, Germany
[2] Univ Leipzig, Dept Comp Sci, Bioinformat Grp, D-7010 Leipzig, Germany
[3] Univ Vienna, Dept Theoret Chem, Vienna, Austria
关键词
D O I
10.1371/journal.pcbi.0030065
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist of multiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative noncoding RNAs and regulatory RNA signals. For many of them, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypothetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classes from genome-wide surveys for structured RNAs. The LocARNA (local alignment of RNA) tool implements a novel variant of the Sankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured or nonconserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seed alignments. Furthermore, we have applied it to a previously published set of 3,332 predicted structured elements in the Ciona intestinalis genome (Missal K, Rose D, Stadler PF (2005) Noncoding RNAs in Ciona intestinalis. Bioinformatics 21 (Supplement 2): i77-i78). In addition to recovering, e.g., tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to date no representative has been experimentally characterized.
引用
收藏
页码:680 / 691
页数:12
相关论文
共 37 条
  • [1] Backofen R, 2004, LECT NOTES COMPUT SC, V3246, P79
  • [2] RNAs everywhere:: Genome-wide annotation of structured RNAs
    Backofen, Rolf
    Bernhart, Stephan H.
    Flamm, Christoph
    Fried, Claudia
    Fritzsch, Guido
    Hackermueller, Joerg
    Hertel, Jana
    Hofacker, Ivo L.
    Missal, Kristin
    Mosig, Axel
    Prohaska, Sonja J.
    Rose, Dominic
    Stadler, Peter F.
    Tanzer, Andrea
    Washietl, Stefan
    Will, Sebastian
    [J]. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION, 2007, 308B (01) : 1 - 25
  • [3] Backofen Rolf, 2004, J Bioinform Comput Biol, V2, P681, DOI 10.1142/S0219720004000818
  • [4] Global identification of human transcribed sequences with genome tiling arrays
    Bertone, P
    Stolc, V
    Royce, TE
    Rozowsky, JS
    Urban, AE
    Zhu, XW
    Rinn, JL
    Tongprasit, W
    Samanta, M
    Weissman, S
    Gerstein, M
    Snyder, M
    [J]. SCIENCE, 2004, 306 (5705) : 2242 - 2246
  • [5] The transcriptional landscape of the mammalian genome
    Carninci, P
    Kasukawa, T
    Katayama, S
    Gough, J
    Frith, MC
    Maeda, N
    Oyama, R
    Ravasi, T
    Lenhard, B
    Wells, C
    Kodzius, R
    Shimokawa, K
    Bajic, VB
    Brenner, SE
    Batalov, S
    Forrest, ARR
    Zavolan, M
    Davis, MJ
    Wilming, LG
    Aidinis, V
    Allen, JE
    Ambesi-Impiombato, X
    Apweiler, R
    Aturaliya, RN
    Bailey, TL
    Bansal, M
    Baxter, L
    Beisel, KW
    Bersano, T
    Bono, H
    Chalk, AM
    Chiu, KP
    Choudhary, V
    Christoffels, A
    Clutterbuck, DR
    Crowe, ML
    Dalla, E
    Dalrymple, BP
    de Bono, B
    Della Gatta, G
    di Bernardo, D
    Down, T
    Engstrom, P
    Fagiolini, M
    Faulkner, G
    Fletcher, CF
    Fukushima, T
    Furuno, M
    Futaki, S
    Gariboldi, M
    [J]. SCIENCE, 2005, 309 (5740) : 1559 - 1563
  • [6] Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution
    Cheng, J
    Kapranov, P
    Drenkow, J
    Dike, S
    Brubaker, S
    Patel, S
    Long, J
    Stern, D
    Tammana, H
    Helt, G
    Sementchenko, V
    Piccolboni, A
    Bekiranov, S
    Bailey, DK
    Ganesh, M
    Ghosh, S
    Bell, I
    Gerhard, DS
    Gingeras, TR
    [J]. SCIENCE, 2005, 308 (5725) : 1149 - 1154
  • [7] Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints
    D Dowell, Robin
    Eddy, Sean R.
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [8] HOW OLD IS THE GENETIC-CODE - STATISTICAL GEOMETRY OF TRANSFER-RNA PROVIDES AN ANSWER
    EIGEN, M
    LINDEMANN, BF
    TIETZE, M
    WINKLEROSWATITSCH, R
    DRESS, A
    VONHAESELER, A
    [J]. SCIENCE, 1989, 244 (4905) : 673 - 679
  • [9] A benchmark of multiple sequence alignment programs upon structural RNAs
    Gardner, PP
    Wilm, A
    Washietl, S
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (08) : 2433 - 2439
  • [10] Gorodkin J, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P120