Background: Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. Results: We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. Conclusion: DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed.
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Baltrus, David A.
Nishimura, Marc T.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Nishimura, Marc T.
Romanchuk, Artur
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Romanchuk, Artur
Chang, Jeff H.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Chang, Jeff H.
Mukhtar, M. Shahid
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Mukhtar, M. Shahid
Cherkis, Karen
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Cherkis, Karen
Roach, Jeff
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Ctr Res Comp, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Roach, Jeff
Grant, Sarah R.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Univ N Carolina, Curriculum Genet & Mol Biol, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Grant, Sarah R.
Jones, Corbin D.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Univ N Carolina, Curriculum Genet & Mol Biol, Chapel Hill, NC USA
Univ N Carolina, Carolina Ctr Genome Sci, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Jones, Corbin D.
Dangl, Jeffery L.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Univ N Carolina, Curriculum Genet & Mol Biol, Chapel Hill, NC USA
Univ N Carolina, Carolina Ctr Genome Sci, Chapel Hill, NC USA
Univ N Carolina, Dept Microbiol & Immunol, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Baltrus, David A.
Nishimura, Marc T.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Nishimura, Marc T.
Romanchuk, Artur
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Romanchuk, Artur
Chang, Jeff H.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Chang, Jeff H.
Mukhtar, M. Shahid
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Mukhtar, M. Shahid
Cherkis, Karen
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Cherkis, Karen
Roach, Jeff
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Ctr Res Comp, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Roach, Jeff
Grant, Sarah R.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Univ N Carolina, Curriculum Genet & Mol Biol, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Grant, Sarah R.
Jones, Corbin D.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Univ N Carolina, Curriculum Genet & Mol Biol, Chapel Hill, NC USA
Univ N Carolina, Carolina Ctr Genome Sci, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Jones, Corbin D.
Dangl, Jeffery L.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biol, Chapel Hill, NC 27514 USA
Univ N Carolina, Curriculum Genet & Mol Biol, Chapel Hill, NC USA
Univ N Carolina, Carolina Ctr Genome Sci, Chapel Hill, NC USA
Univ N Carolina, Dept Microbiol & Immunol, Chapel Hill, NC USAUniv N Carolina, Dept Biol, Chapel Hill, NC 27514 USA