Detecting overlapping coding sequences in virus genomes

被引:72
作者
Firth, AE [1 ]
Brown, CM [1 ]
机构
[1] Univ Otago, Dept Biochem, Dunedin, New Zealand
关键词
D O I
10.1186/1471-2105-7-75
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret-especially within overlapping genes; and viruses often employ non-canonical translational mechanisms-e.g. frameshifting, stop codon readthrough, leaky-scanning and internal ribosome entry sites-which can conceal potentially coding open reading frames (ORFs). Results: In a previous paper we introduced a new statistic-MLOGD (Maximum Likelihood Overlapping Gene Detector)-for detecting and analysing overlapping CDSs. Here we present (a) an improved MLOGD statistic, (b) a greatly extended suite of software using MLOGD, (c) a database of results for 640 virus sequence alignments, and (d) a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. Conclusion: MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs in particular overlapping or short CDSs-and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at http://guinevere.otago.ac.nz/mlogd.html.
引用
收藏
页数:6
相关论文
共 13 条
[1]  
[Anonymous], R Project for Statistical Computing (Version 3.0.2)
[2]  
[Anonymous], 2004, PHYLIP PHYLOGENY INF
[3]   CRITICA: Coding region identification tool invoking comparative analysis [J].
Badger, JH ;
Olsen, GJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (04) :512-524
[4]   National Center for Biotechnology Information Viral Genomes Project [J].
Bao, YM ;
Federhen, S ;
Leipe, D ;
Pham, V ;
Resenchuk, S ;
Rozanov, M ;
Tatusov, R ;
Tatusova, T .
JOURNAL OF VIROLOGY, 2004, 78 (14) :7291-7298
[5]   Detecting overlapping coding sequences with pairwise alignments [J].
Firth, AE ;
Brown, CM .
BIOINFORMATICS, 2005, 21 (03) :282-292
[6]   On dynamics of overlapping genes in bacterial genomes [J].
Fukuda, Y ;
Nakayama, Y ;
Tomita, M .
GENE, 2003, 323 :181-187
[7]  
Hammell AB, 1999, GENOME RES, V9, P417
[8]   Properties of overlapping genes are conserved across microbial genomes [J].
Johnson, ZI ;
Chisholm, SW .
GENOME RESEARCH, 2004, 14 (11) :2268-2272
[9]   Efficient implementation of a generalized pair hidden Markov model for comparative gene finding [J].
Majoros, WH ;
Pertea, M ;
Salzberg, SL .
BIOINFORMATICS, 2005, 21 (09) :1782-1788
[10]   Improving gene annotation of complete viral genomes [J].
Mills, R ;
Rozanov, M ;
Lomsadze, A ;
Tatusova, T ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (23) :7041-7055