Detecting overlapping coding sequences in virus genomes

被引:72
|
作者
Firth, AE [1 ]
Brown, CM [1 ]
机构
[1] Univ Otago, Dept Biochem, Dunedin, New Zealand
关键词
D O I
10.1186/1471-2105-7-75
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret-especially within overlapping genes; and viruses often employ non-canonical translational mechanisms-e.g. frameshifting, stop codon readthrough, leaky-scanning and internal ribosome entry sites-which can conceal potentially coding open reading frames (ORFs). Results: In a previous paper we introduced a new statistic-MLOGD (Maximum Likelihood Overlapping Gene Detector)-for detecting and analysing overlapping CDSs. Here we present (a) an improved MLOGD statistic, (b) a greatly extended suite of software using MLOGD, (c) a database of results for 640 virus sequence alignments, and (d) a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. Conclusion: MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs in particular overlapping or short CDSs-and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at http://guinevere.otago.ac.nz/mlogd.html.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Detecting overlapping coding sequences in virus genomes
    Andrew E Firth
    Chris M Brown
    BMC Bioinformatics, 7
  • [2] Detecting overlapping coding sequences with pairwise alignments
    Firth, AE
    Brown, CM
    BIOINFORMATICS, 2005, 21 (03) : 282 - 292
  • [3] Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes
    Lin, Michael F.
    Kheradpour, Pouya
    Washietl, Stefan
    Parker, Brian J.
    Pedersen, Jakob S.
    Kellis, Manolis
    GENOME RESEARCH, 2011, 21 (11) : 1916 - 1928
  • [4] A method for coding linear displacements by overlapping sequences
    Kharlamov, BP
    AUTOMATION AND REMOTE CONTROL, 1996, 57 (01) : 138 - 141
  • [5] Novel overlapping coding sequences in Chlamydia trachomatis
    Jensen, Klaus T.
    Petersen, Lise
    Falk, Soren
    Iversen, Pernille
    Andersen, Peter
    Theisen, Michael
    Krogh, Anders
    FEMS MICROBIOLOGY LETTERS, 2006, 265 (01) : 106 - 117
  • [7] Overlapping codes within protein-coding sequences
    Itzkovitz, Shalev
    Hodis, Eran
    Segal, Eran
    GENOME RESEARCH, 2010, 20 (11) : 1582 - 1589
  • [8] ICDS database: interrupted CoDing sequences in prokaryotic genomes
    Perrodou, Emmanuel
    Deshayes, Caroline
    Muller, Jean
    Schaeffer, Christine
    Van Dorsselaer, Alain
    Ripp, Raymond
    Poch, Olivier
    Reyrat, Jean-Marc
    Lecompte, Odile
    NUCLEIC ACIDS RESEARCH, 2006, 34 : D338 - D343
  • [9] Extremely conserved non-coding sequences in vertebrate genomes
    Makunin, I., V
    Stephen, S.
    Pheasant, M.
    Bejerano, G.
    Kent, J. W.
    Haussler, H.
    Mattick, J. S.
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2004, : 138 - 140
  • [10] Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes
    Neuhaus, Klaus
    Oelke, Daniela
    Fuerst, David
    Scherer, Siegfried
    Keim, Daniel A.
    EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS, PROCEEDINGS, 2010, 6023 : 228 - +