Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models

被引:59
作者
Nicolas, P
Bize, L
Muri, F
Hoebeke, M
Rodolphe, F
Ehrlich, SD
Prum, B
Bessières, P
机构
[1] INRA, Lab Math Informat & Genome, F-78026 Versailles, France
[2] CNRS, Lab Stat Genome, F-91034 Evry, France
[3] INRA, Lab Genet Microbienne, F-78352 Jouy En Josas, France
关键词
D O I
10.1093/nar/30.6.1418
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present here the use of a new statistical segmentation method on the Bacillus subtilis chromosome sequence. Maximum likelihood parameter estimation of a hidden Markov model, based on the expectation-maximization algorithm, enables one to segment the DNA sequence according to its local composition. This approach is not based on sliding windows; it enables different compositional classes to be separated without prior knowledge of their content, size and localization. We compared these compositional classes, obtained from the sequence, with the annotated DNA physical map, sequence homologies and repeat regions. The first heterogeneity revealed discriminates between the two coding strands and the non-coding regions. Other main heterogeneities arise; some are related to horizontal gene transfer, some to t-enriched composition of hydrophobic protein coding strands, and others to the codon usage fitness of highly expressed genes. Concerning potential and established gene transfers, we found 9 of the 10 known prophages, plus 14 new regions of atypical composition. Some of them are surrounded by repeats, most of their genes have unknown function or possess homology to genes involved in secondary catabolism, metal and antibiotic resistance. Surprisingly, we notice that all of these detected regions are a + t-richer than the host genome, raising the question of their remote sources.
引用
收藏
页码:1418 / 1426
页数:9
相关论文
共 61 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Baldi P., 1998, Bioinformatics: The machine learning approach
[3]   STATISTICAL INFERENCE FOR PROBABILISTIC FUNCTIONS OF FINITE STATE MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (06) :1554-&
[4]   Isochores and the evolutionary genomics of vertebrates [J].
Bernardi, G .
GENE, 2000, 241 (01) :3-17
[5]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[6]  
Biaudet V, 1997, COMPUT APPL BIOSCI, V13, P431
[7]  
BIZE L, 1999, RECOMB99, P43
[8]   DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES [J].
BORODOVSKY, M ;
MCININCH, JD ;
KOONIN, EV ;
RUDD, KE ;
MEDIGUE, C ;
DANCHIN, A .
NUCLEIC ACIDS RESEARCH, 1995, 23 (17) :3554-3562
[9]   Detecting homogeneous segments in DNA sequences by using hidden Markov models [J].
Boys, RJ ;
Henderson, DA ;
Wilkinson, DJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2000, 49 :269-285
[10]  
Braun JV, 1998, STAT SCI, V13, P142