GET_HOMOLOGUES, a Versatile Software Package for Scalable and Robust Microbial Pangenome Analysis

被引:664
作者
Contreras-Moreira, Bruno [1 ,2 ]
Vinuesa, Pablo [3 ]
机构
[1] CSIC, EEAD, Zaragoza, Spain
[2] Fdn ARAID, Zaragoza, Spain
[3] Univ Nacl Autonoma Mexico, Ctr Ciencias Genom, Cuernavaca 62191, Morelos, Mexico
关键词
STREPTOCOCCUS-PNEUMONIAE; ORTHOLOGY; GENOMICS; DATABASE; BACTERIAL; CLUSTERS; SEQUENCE;
D O I
10.1128/AEM.02411-13
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
GET_HOMOLOGUES is an open-source software package that builds on popular orthology-calling approaches making highly customizable and detailed pangenome analyses of microorganisms accessible to nonbioinformaticians. It can cluster homologous gene families using the bidirectional best-hit, COGtriangles, or OrthoMCL clustering algorithms. Clustering stringency can be adjusted by scanning the domain composition of proteins using the HMMER3 package, by imposing desired pairwise alignment coverage cutoffs, or by selecting only syntenic genes. The resulting homologous gene families can be made even more robust by computing consensus clusters from those generated by any combination of the clustering algorithms and filtering criteria. Auxiliary scripts make the construction, interrogation, and graphical display of core genome and pangenome sets easy to perform. Exponential and binomial mixture models can be fitted to the data to estimate theoretical core genome and pangenome sizes, and high-quality graphics can be generated. Furthermore, pangenome trees can be easily computed and basic comparative genomics performed to identify lineage-specific genes or gene family expansions. The software is designed to take advantage of modern multiprocessor personal computers as well as computer clusters to parallelize time-consuming tasks. To demonstrate some of these capabilities, we survey a set of 50 Streptococcus genomes annotated in the Orthologous Matrix (OMA) browser as a benchmark case. The package can be downloaded at http://www.eead.csic.es/compbio/soft/gethoms.php and http://maya.ccg.unam.mx/soft/gethoms.php.
引用
收藏
页码:7696 / 7701
页数:6
相关论文
共 45 条
[1]   Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs [J].
Altenhoff, Adrian M. ;
Studer, Romain A. ;
Robinson-Rechavi, Marc ;
Dessimoz, Christophe .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (05)
[2]  
Altenhoff AM, 2012, METHODS MOL BIOL, V855, P259, DOI 10.1007/978-1-61779-582-4_9
[3]   OMA 2011: orthology inference among 1000 complete genomes [J].
Altenhoff, Adrian M. ;
Schneider, Adrian ;
Gonnet, Gaston H. ;
Dessimoz, Christophe .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D289-D294
[4]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[5]  
[Anonymous], 2004, Inferring phylogenies
[6]   Screening of Streptococcus pneumoniae ABC Transporter Mutants Demonstrates that LivJHMGF, a Branched-Chain Amino Acid ABC Transporter, Is Necessary for Disease Pathogenesis [J].
Basavanna, Shilpa ;
Khandavilli, Suneeta ;
Yuste, Jose ;
Cohen, Jonathan M. ;
Hosie, Arthur H. F. ;
Webb, Alexander J. ;
Thomas, Gavin H. ;
Brown, Jeremy S. .
INFECTION AND IMMUNITY, 2009, 77 (08) :3412-3423
[7]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[8]   Computing prokaryotic gene ubiquity: Rescuing the core from extinction [J].
Charlebois, RL ;
Doolittle, WF .
GENOME RESEARCH, 2004, 14 (12) :2469-2477
[9]   Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species [J].
Donati, Claudio ;
Hiller, N. Luisa ;
Tettelin, Herve ;
Muzzi, Alessandro ;
Croucher, Nicholas J. ;
Angiuoli, Samuel V. ;
Oggioni, Marco ;
Hotopp, Julie C. Dunning ;
Hu, Fen Z. ;
Riley, David R. ;
Covacci, Antonello ;
Mitchell, Tim J. ;
Bentley, Stephen D. ;
Kilian, Morgens ;
Ehrlich, Garth D. ;
Rappuoli, Rino ;
Moxon, E. Richard ;
Masignani, Vega .
GENOME BIOLOGY, 2010, 11 (10)
[10]  
Felsenstein J., 2004, Phylip (phylogeny inference package) version 3.6