BPGA- an ultra-fast pan-genome analysis pipeline

被引:745
作者
Chaudhari, Narendrakumar M. [1 ]
Gupta, Vinod Kumar [1 ]
Dutta, Chitra [1 ]
机构
[1] Indian Inst Chem Biol, CSIR, Struct Biol & Bioinformat Div, 4 Raja SC Mullick Rd, Kolkata 700032, India
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
关键词
STREPTOCOCCUS-PNEUMONIAE; SEQUENCE; IDENTIFICATION; REVEALS; STRAINS; CORE; PANGENOME; EVOLUTION; INSIGHTS; VACCINE;
D O I
10.1038/srep24373
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG & COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains.
引用
收藏
页数:10
相关论文
共 43 条
[1]   Complete genome sequence of Cannes 8 virus, a new member of the proposed family "Marseilleviridae" [J].
Aherfi, Sarah ;
Pagnier, Isabelle ;
Fournous, Ghislain ;
Raoult, Didier ;
La Scola, Bernard ;
Colson, Philippe .
VIRUS GENES, 2013, 47 (03) :550-555
[2]   ITEP: An integrated toolkit for exploration of microbial pan-genomes [J].
Benedict, Matthew N. ;
Henriksen, James R. ;
Metcalf, William W. ;
Whitaker, Rachel J. ;
Price, Nathan D. .
BMC GENOMICS, 2014, 15
[3]   PGAT: a multistrain analysis resource for microbial genomes [J].
Brittnacher, M. J. ;
Fong, C. ;
Hayden, H. S. ;
Jacobs, M. A. ;
Radey, Matthew ;
Rohmer, L. .
BIOINFORMATICS, 2011, 27 (17) :2429-2430
[4]   Whole-genome sequencing of multiple Arabidopsis thaliana populations [J].
Cao, Jun ;
Schneeberger, Korbinian ;
Ossowski, Stephan ;
Guenther, Torsten ;
Bender, Sebastian ;
Fitz, Joffrey ;
Koenig, Daniel ;
Lanz, Christa ;
Stegle, Oliver ;
Lippert, Christoph ;
Wang, Xi ;
Ott, Felix ;
Mueller, Jonas ;
Alonso-Blanco, Carlos ;
Borgwardt, Karsten ;
Schmid, Karl J. ;
Weigel, Detlef .
NATURE GENETICS, 2011, 43 (10) :956-U60
[5]   Emergence of scarlet fever Streptococcus pyogenes emm12 clones in Hong Kong is associated with toxin acquisition and multidrug resistance [J].
Davies, Mark R. ;
Holden, Matthew T. ;
Coupland, Paul ;
Chen, Jonathan H. K. ;
Venturini, Carola ;
Barnett, Timothy C. ;
Ben Zakour, Nouri L. ;
Tse, Herman ;
Dougan, Gordon ;
Yuen, Kwok-Yung ;
Walker, Mark J. .
NATURE GENETICS, 2015, 47 (01) :84-+
[6]   Genomics of Streptococcus salivarius, a major human commensal [J].
Delorme, Christine ;
Abraham, Anne-Laure ;
Renault, Pierre ;
Guedon, Eric .
INFECTION GENETICS AND EVOLUTION, 2015, 33 :381-392
[7]   Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species [J].
Donati, Claudio ;
Hiller, N. Luisa ;
Tettelin, Herve ;
Muzzi, Alessandro ;
Croucher, Nicholas J. ;
Angiuoli, Samuel V. ;
Oggioni, Marco ;
Hotopp, Julie C. Dunning ;
Hu, Fen Z. ;
Riley, David R. ;
Covacci, Antonello ;
Mitchell, Tim J. ;
Bentley, Stephen D. ;
Kilian, Morgens ;
Ehrlich, Garth D. ;
Rappuoli, Rino ;
Moxon, E. Richard ;
Masignani, Vega .
GENOME BIOLOGY, 2010, 11 (10)
[8]   Analysis of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number variants distributed in diverse yeast strains from differing industrial environments [J].
Dunn, Barbara ;
Richter, Chandra ;
Kvitek, Daniel J. ;
Pugh, Tom ;
Sherlock, Gavin .
GENOME RESEARCH, 2012, 22 (05) :908-924
[9]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[10]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461