RNAML: A standard syntax for exchanging RNA information

被引:72
作者
Waugh, A
Gendron, P
Altman, R
Brown, JW
Case, D
Gautheret, D
Harvey, SC
Leontis, N
Westbrook, J
Westhof, E
Zuker, M
Major, F
机构
[1] Univ Montreal, Dept Informat & Rech Operat, Montreal, PQ H3C 3J7, Canada
[2] Stanford Univ, Ctr Med, Stanford Med Informat, Stanford, CA 94305 USA
[3] N Carolina State Univ, Dept Microbiol, Raleigh, NC 27695 USA
[4] Scripps Res Inst, Dept Mol Biol, La Jolla, CA 92037 USA
[5] Ctr Natl Rech Sci, F-28809 Marseille, France
[6] Univ Alabama, Dept Biochem, Birmingham, AL 35294 USA
[7] Bowling Green State Univ, Dept Chem, Bowling Green, OH 43403 USA
[8] Rutgers State Univ, Dept Chem, Piscataway, NJ 08855 USA
[9] IBMC, Ctr Natl Rech Sci, F-67084 Strasbourg, France
[10] Rensselaer Polytech Inst, Troy, NY 12180 USA
关键词
data storage; file format; knowledge representation; XML;
D O I
10.1017/S1355838202028017
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Analyzing a single data set using multiple RNA informatics programs often requires a file format conversion between each pair of programs, significantly hampering productivity. To facilitate the interoperation of these programs, we propose a syntax to exchange basic RNA molecular information. This RNAML syntax allows for the storage and the exchange of information about RNA sequence and secondary and tertiary structures. The syntax permits the description of higher level information about the data including, but not restricted to, base pairs, base triples, and pseudoknots. A class-oriented approach allows us to represent data common to a given set of RNA molecules, such as a sequence alignment and a consensus secondary structure. Documentation about experiments and computations, as well as references to journals and external databases, are included in the syntax. The chief challenge in creating such a syntax was to determine the appropriate scope of usage and to ensure extensibility as new needs will arise. The syntax complies with the eXtensible Markup Language (XML) recommendations, a widely accepted standard for syntax specifications. In addition to the various generic packages that exist to read and interpret XML formats, an XML processor was developed and put in the open-source MC-Core library for nucleic acid and protein structure computer manipulation.
引用
收藏
页码:707 / 717
页数:11
相关论文
共 17 条
  • [1] XML, bioinformatics and data integration
    Achard, F
    Vaysseix, G
    Barillot, E
    [J]. BIOINFORMATICS, 2001, 17 (02) : 115 - 125
  • [2] Bada MA, 2000, METHOD ENZYMOL, V317, P470
  • [3] THE NUCLEIC-ACID DATABASE - A COMPREHENSIVE RELATIONAL DATABASE OF 3-DIMENSIONAL STRUCTURES OF NUCLEIC-ACIDS
    BERMAN, HM
    OLSON, WK
    BEVERIDGE, DL
    WESTBROOK, J
    GELBIN, A
    DEMENY, T
    HSIEH, SH
    SRINIVASAN, AR
    SCHNEIDER, B
    [J]. BIOPHYSICAL JOURNAL, 1992, 63 (03) : 751 - 759
  • [4] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [5] The Ribonuclease P Database
    Brown, JW
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 314 - 314
  • [6] The biopolymer markup language
    Fenyö, D
    [J]. BIOINFORMATICS, 1999, 15 (04) : 339 - 340
  • [7] Quantitative analysis of nucleic acid three-dimensional structures
    Gendron, P
    Lemieux, S
    Major, F
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 308 (05) : 919 - 936
  • [8] *INT UN PUR APPL C, 1992, BIOCH NOM REL DOC
  • [9] LAURENT SS, 1999, INSIDE XML DTDS SCI
  • [10] Geometric nomenclature and classification of RNA base pairs
    Leontis, NB
    Westhof, E
    [J]. RNA, 2001, 7 (04) : 499 - 512