GeneMark.hmm: new solutions for gene finding

被引:1180
作者
Lukashin, AV
Borodovsky, M [1 ]
机构
[1] Georgia Inst Technol, Sch Biol, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Sch Math, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/nar/26.4.1107
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algorithms. The GeneMark.hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The idea was to embed the GeneMark models into naturally derived hidden Markov model framework with gene boundaries modeled as transitions between hidden states. We also used the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. The algorithm was evaluated on several test sets including 10 complete bacterial genomes. It was shown that the new algorithm is significantly more accurate than GeneMark in exact gene prediction. Interestingly, the high gene finding accuracy was observed even in the case when Markov models of order zero, one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.
引用
收藏
页码:1107 / 1115
页数:9
相关论文
共 31 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[3]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[4]   DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES [J].
BORODOVSKY, M ;
MCININCH, JD ;
KOONIN, EV ;
RUDD, KE ;
MEDIGUE, C ;
DANCHIN, A .
NUCLEIC ACIDS RESEARCH, 1995, 23 (17) :3554-3562
[5]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[6]   Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [J].
Bult, CJ ;
White, O ;
Olsen, GJ ;
Zhou, LX ;
Fleischmann, RD ;
Sutton, GG ;
Blake, JA ;
FitzGerald, LM ;
Clayton, RA ;
Gocayne, JD ;
Kerlavage, AR ;
Dougherty, BA ;
Tomb, JF ;
Adams, MD ;
Reich, CI ;
Overbeek, R ;
Kirkness, EF ;
Weinstock, KG ;
Merrick, JM ;
Glodek, A ;
Scott, JL ;
Geoghagen, NSM ;
Weidman, JF ;
Fuhrmann, JL ;
Nguyen, D ;
Utterback, TR ;
Kelley, JM ;
Peterson, JD ;
Sadow, PW ;
Hanna, MC ;
Cotton, MD ;
Roberts, KM ;
Hurst, MA ;
Kaine, BP ;
Borodovsky, M ;
Klenk, HP ;
Fraser, CM ;
Smith, HO ;
Woese, CR ;
Venter, JC .
SCIENCE, 1996, 273 (5278) :1058-1073
[7]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[8]  
CHURCHILL GA, 1989, B MATH BIOL, V51, P79
[9]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[10]   THE MINIMAL GENE COMPLEMENT OF MYCOPLASMA-GENITALIUM [J].
FRASER, CM ;
GOCAYNE, JD ;
WHITE, O ;
ADAMS, MD ;
CLAYTON, RA ;
FLEISCHMANN, RD ;
BULT, CJ ;
KERLAVAGE, AR ;
SUTTON, G ;
KELLEY, JM ;
FRITCHMAN, JL ;
WEIDMAN, JF ;
SMALL, KV ;
SANDUSKY, M ;
FUHRMANN, J ;
NGUYEN, D ;
UTTERBACK, TR ;
SAUDEK, DM ;
PHILLIPS, CA ;
MERRICK, JM ;
TOMB, JF ;
DOUGHERTY, BA ;
BOTT, KF ;
HU, PC ;
LUCIER, TS ;
PETERSON, SN ;
SMITH, HO ;
HUTCHISON, CA ;
VENTER, JC .
SCIENCE, 1995, 270 (5235) :397-403