Bacterial start site prediction

被引:33
作者
Hannenhalli, SS
Hayes, WS
Hatzigeorgiou, AG
Fickett, JW
机构
[1] SmithKline Beecham Pharmaceut, Bioinformat, King Of Prussia, PA 19406 USA
[2] Synapt Ltd, Voutes Heraklion 71110, Greece
关键词
D O I
10.1093/nar/27.17.3577
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the growing number of completely sequenced bacterial genes, accurate gene prediction in bacterial genomes remains an important problem, Although the existing tools predict genes in bacterial genomes with high overall accuracy, their ability to pinpoint the translation start site remains unsatisfactory. In this paper, we present a novel approach to bacterial start site prediction that takes into account multiple features of a potential start site, viz,, ribosome binding site (RBS) binding energy, distance of the RES from the start codon, distance from the beginning of the maximal ORF to the start codon, the start codon itself and the coding/non-coding potential around the start site. Mixed integer programing was used to optimize the discriminatory system, The accuracy of this approach is up to 90%, compared to 70%, using the most common tools in fully automated mode (that is, without expert human post-processing of results). The approach is evaluated using Bacillus subtilis, Escherichia coli and Pyrococcus furiosus, These three genomes cover a broad spectrum of bacterial genomes, since B.subtilis is a Gram-positive bacterium, E.coli is a Gram-negative bacterium and P.furiosus is an archaebacterium. A significant problem is generating a set of 'true' start sites for algorithm training, in the absence of experimental work. We found that sequence conservation between P.furiosus and the related Pyrococcus horikoshii clearly delimited the gene start in many cases, providing a sufficient training set.
引用
收藏
页码:3577 / 3582
页数:6
相关论文
共 24 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] [Anonymous], 1990, METHOD ENZYMOL
  • [3] Beasley J. E., 1996, ADV LINEAR INTEGER P
  • [4] The complete genome sequence of Escherichia coli K-12
    Blattner, FR
    Plunkett, G
    Bloch, CA
    Perna, NT
    Burland, V
    Riley, M
    ColladoVides, J
    Glasner, JD
    Rode, CK
    Mayhew, GF
    Gregor, J
    Davis, NW
    Kirkpatrick, HA
    Goeden, MA
    Rose, DJ
    Mau, B
    Shao, Y
    [J]. SCIENCE, 1997, 277 (5331) : 1453 - +
  • [5] DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES
    BORODOVSKY, M
    MCININCH, JD
    KOONIN, EV
    RUDD, KE
    MEDIGUE, C
    DANCHIN, A
    [J]. NUCLEIC ACIDS RESEARCH, 1995, 23 (17) : 3554 - 3562
  • [6] GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS
    BORODOVSKY, M
    MCININCH, J
    [J]. COMPUTERS & CHEMISTRY, 1993, 17 (02): : 123 - 133
  • [7] FICKETT J, 1996, TRENDS GENET, V12, P1058
  • [8] Fourer R, 1993, AMPL MODELING LANGUA
  • [9] IMPROVED FREE-ENERGY PARAMETERS FOR PREDICTIONS OF RNA DUPLEX STABILITY
    FREIER, SM
    KIERZEK, R
    JAEGER, JA
    SUGIMOTO, N
    CARUTHERS, MH
    NEILSON, T
    TURNER, DH
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1986, 83 (24) : 9373 - 9377
  • [10] Combining diverse evidence for gene recognition in completely sequenced bacterial genomes
    Frishman, D
    Mironov, A
    Mewes, HW
    Gelfand, M
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (12) : 2941 - 2947