A novel method for accurate operon predictions in all sequenced prokaryotes

被引:285
作者
Price, MN
Huang, KH
Alm, EJ
Arkin, AP
机构
[1] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[2] Howard Hughes Med Inst, Berkeley, CA USA
[3] Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USA
关键词
D O I
10.1093/nar/gki232
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacter pylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from six phylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC 6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.
引用
收藏
页码:880 / 892
页数:13
相关论文
共 31 条
[1]   CRITICA: Coding region identification tool invoking comparative analysis [J].
Badger, JH ;
Olsen, GJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (04) :512-524
[2]   A Bayesian network approach to operon prediction [J].
Bockhorst, J ;
Craven, M ;
Page, D ;
Shavlik, J ;
Glasner, J .
BIOINFORMATICS, 2003, 19 (10) :1227-1235
[3]   Predicting bacterial transcription units using sequence and expression data [J].
Bockhorst, Joseph ;
Qiu, Yu ;
Glasner, Jeremy ;
Liu, Mingzhu ;
Blattner, Frederick ;
Craven, Mark .
BIOINFORMATICS, 2003, 19 :i34-i43
[4]   Genome size and operon content [J].
Cherry, JL .
JOURNAL OF THEORETICAL BIOLOGY, 2003, 221 (03) :401-410
[5]  
De Hoon MJL, 2003, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, P276
[6]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[7]   Prediction of operons in microbial genomes [J].
Ermolaeva, MD ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2001, 29 (05) :1216-1221
[8]   The Stanford Microarray Database: data access and quality assessment tools [J].
Gollub, J ;
Ball, CA ;
Binkley, G ;
Demeter, J ;
Finkelstein, DB ;
Hebert, JM ;
Hernandez-Boussard, T ;
Jin, H ;
Kaloper, M ;
Matese, JC ;
Schroeder, M ;
Brown, PO ;
Botstein, D ;
Sherlock, G .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :94-96
[9]   Predicting protein function by genomic context: Quantitative evaluation and qualitative inferences [J].
Huynen, M ;
Snel, B ;
Lathe, W ;
Bork, P .
GENOME RESEARCH, 2000, 10 (08) :1204-1210
[10]   Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes [J].
Itoh, T ;
Takemoto, K ;
Mori, H ;
Gojobori, T .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (03) :332-346