CGUG: In silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

被引:42
作者
Mahadevan P. [1 ,2 ]
King J.F. [1 ,3 ]
Seto D. [1 ]
机构
[1] Department of Bioinformatics and Computational Biology, George Mason University, Manassas, VA 20110
[2] Department of Biological Sciences, Vanderbilt University, Nashville
[3] Kingdomain Corporation, Fairfax, VA 22032
关键词
Hypothetical Protein; Core Gene; Percent Identity; Data Mining Tool; Francisella Tularensis;
D O I
10.1186/1756-0500-2-168
中图分类号
学科分类号
摘要
Background. Viruses and small-genome bacteria (∼2 megabases and smaller) comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. " CoreGenesUniqueGenes" (CGUG) is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. Findings. CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. Conclusion. CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins. © 2009 Seto et al; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 15 条
  • [1] Zafar N., Mazumder R., Seto D., CoreGenes: A computational tool for identifying and cataloging "core" genes in a set of small genomes, BMC Bioinformatics, 3, (2002)
  • [2] Koonin E.V., Comparative genomics, minimal gene-sets and the last universal common ancestor, Nature Reviews, 1, 2, pp. 127-136, (2003)
  • [3] Lerat E., Daubin V., Moran N.A., From gene trees to organismal phylogeny in prokaryotes: The case of the gamma-Proteobacteria, PLoS Biology, 1, 1, (2003)
  • [4] Lavigne R., Seto D., Mahadevan P., Ackermann H.W., Kropinski A.M., Unifying classical and molecular taxonomic classification: Analysis of the Podoviridae using BLASTP-based tools, Research in Microbiology, 159, 5, pp. 406-414, (2008)
  • [5] Tettelin H., Masignani V., Cieslewicz M.J., Donati C., Medini D., Ward N.L., Angiuoli S.V., Crabtree J., Jones A.L., Durkin A.S., Et al., Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome, Proceedings of the National Academy of Sciences of the United States of America, 102, 39, pp. 13950-13955, (2005)
  • [6] Medini D., Donati C., Tettelin H., Masignani V., Rappuoli R., The microbial pan-genome, Current Opinion in Genetics & Development, 15, 6, pp. 589-594, (2005)
  • [7] Parra G., Bradnam K., Korf I., CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes, Bioinformatics (Oxford, England), 23, 9, pp. 1061-1067, (2007)
  • [8] Fane B., Microviridae, Virus Taxonomy: Classification and Nomenclature of Viruses: Eighth Report of the International Committee on Taxonomy of Viruses, pp. 288-299, (2005)
  • [9] Rohwer F., Edwards R., The phage proteomic tree: A genome-based taxonomy for phage, Journal of Bacteriology, 184, 16, pp. 4529-4535, (2002)
  • [10] Mazumder R., Kolaskar A., Seto D., GeneOrder: Comparing the order of genes in small genomes, Bioinformatics (Oxford, England), 17, 2, pp. 162-166, (2001)