Self-identification of protein-coding regions in microbial genomes

被引:45
作者
Audic, S [1 ]
Claverie, JM [1 ]
机构
[1] CNRS, Struct & Genet Informat Lab, F-13402 Marseille, France
关键词
D O I
10.1073/pnas.95.17.10026
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A new method for predicting protein-coding regions in microbial genomic DNA sequences is presented. It uses an ab initio iterative Markov modeling procedure to automatically perform the partition of genomic sequences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current methods, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chern. 17, 123-133], no training set or prior knowledge of the statistical properties of the studied genome are required. This new method tolerates error rates of 1-2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequence data from uncharacterized microorganisms. The method was validated on 10 complete bacterial genomes (from four major phylogenetic lineages). The results show that protein-coding regions can be identified with an accuracy of up to 90% with a totally automated and objective procedure.
引用
收藏
页码:10026 / 10031
页数:6
相关论文
共 30 条
[1]   A MARKOV ANALYSIS OF DNA-SEQUENCES [J].
ALMAGOR, H .
JOURNAL OF THEORETICAL BIOLOGY, 1983, 104 (04) :633-645
[2]   Detection of eukaryotic promoters using Markov transition matrices [J].
Audic, S ;
Claverie, JM .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :223-227
[3]   GenBank [J].
Benson, DA ;
Boguski, MS ;
Lipman, DJ ;
Ostell, J ;
Ouellette, BFF .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :1-7
[4]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[5]  
Borodovskii M Iu, 1986, Mol Biol (Mosk), V20, P1024
[6]   DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES [J].
BORODOVSKY, M ;
MCININCH, JD ;
KOONIN, EV ;
RUDD, KE ;
MEDIGUE, C ;
DANCHIN, A .
NUCLEIC ACIDS RESEARCH, 1995, 23 (17) :3554-3562
[7]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[8]   Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [J].
Bult, CJ ;
White, O ;
Olsen, GJ ;
Zhou, LX ;
Fleischmann, RD ;
Sutton, GG ;
Blake, JA ;
FitzGerald, LM ;
Clayton, RA ;
Gocayne, JD ;
Kerlavage, AR ;
Dougherty, BA ;
Tomb, JF ;
Adams, MD ;
Reich, CI ;
Overbeek, R ;
Kirkness, EF ;
Weinstock, KG ;
Merrick, JM ;
Glodek, A ;
Scott, JL ;
Geoghagen, NSM ;
Weidman, JF ;
Fuhrmann, JL ;
Nguyen, D ;
Utterback, TR ;
Kelley, JM ;
Peterson, JD ;
Sadow, PW ;
Hanna, MC ;
Cotton, MD ;
Roberts, KM ;
Hurst, MA ;
Kaine, BP ;
Borodovsky, M ;
Klenk, HP ;
Fraser, CM ;
Smith, HO ;
Woese, CR ;
Venter, JC .
SCIENCE, 1996, 273 (5278) :1058-1073
[9]   The difficulty of identifying genes in anonymous vertebrate sequences [J].
Claverie, JM ;
Poirot, O ;
Lopez, F .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :203-214
[10]   Computational methods for the identification of genes in vertebrate genomic sequences [J].
Claverie, JM .
HUMAN MOLECULAR GENETICS, 1997, 6 (10) :1735-1744