Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups

被引:51
作者
Abram, Kaleb [1 ]
Udaondo, Zulema [1 ]
Bleker, Carissa [2 ,3 ]
Wanchai, Visanu [1 ]
Wassenaar, Trudy M. [4 ]
Robeson, Michael S., II [1 ]
Ussery, David W. [1 ]
机构
[1] Univ Arkansas Med Sci, Dept Biomed Informat, Little Rock, AR 72205 USA
[2] Univ Tennessee, Bredesen Ctr Interdisciplinary Res & Grad Educ, Knoxville, TN 37996 USA
[3] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA
[4] Mol Microbiol & Genom Consultants, D-55576 Zotzenheim, Germany
关键词
D O I
10.1038/s42003-020-01626-5
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species. Kaleb Abram and Zulema Udaondo et al. analyze over 100,000 publicly available E. coli and Shigella genome sequences and perform a Mash-based analysis to identify 14 unique phylogroups. Their results reveal that most of the sequenced E. coli genomes belong to four distinct phylogroups.
引用
收藏
页数:12
相关论文
共 56 条
  • [1] Abram K, 2020, SUPPLEMENTARY VIDEO, DOI [10.6084/m9.figshare.11473308, DOI 10.6084/M9.FIGSHARE.11473308]
  • [2] Abram K, 2020, SUPPLEMENTARY MOVIE, DOI [10.6084/m9.figshare.13105235, DOI 10.6084/M9.FIGSHARE.13105235]
  • [3] Abram K, MASH BASED ANALYSES, DOI [10.5281/zenodo.4091750, DOI 10.5281/ZENODO.4091750]
  • [4] Pangenome of Serratia marcescens strains from nosocomial and environmental origins reveals different populations and the links between them
    Abreo, Eduardo
    Altier, Nora
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [5] Alm EW, 2011, POPULATION GENETICS OF BACTERIA: A TRIBUTE TO THOMAS S. WHITTAM, P69
  • [6] Microreact: visualizing and sharing data for genomic epidemiology and phylogeography
    Argimon, Silvia
    Abudahab, Khalil
    Goater, Richard J. E.
    Fedosejev, Artemij
    Bhai, Jyothish
    Glasner, Corinna
    Feil, Edward J.
    Holden, Matthew T. G.
    Yeats, Corin A.
    Grundmann, Hajo
    Spratt, Brian G.
    Aanensen, David M.
    [J]. MICROBIAL GENOMICS, 2016, 2 (11): : e000093
  • [7] Variation in endogenous oxidative stress in Escherichia coli natural isolates during growth in urine
    Aubron, Cecile
    Glodt, Jeremy
    Matar, Corine
    Huet, Olivier
    Borderie, Didier
    Dobrindt, Ulrich
    Duranteau, Jacques
    Denamur, Erick
    Conti, Marc
    Bouvet, Odile
    [J]. BMC MICROBIOLOGY, 2012, 12
  • [8] The Temporal Dynamics of Slightly Deleterious Mutations in Escherichia coli and Shigella spp.
    Balbi, Kevin J.
    Rocha, Eduardo P. C.
    Feil, Edward J.
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (02) : 345 - 355
  • [9] ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping
    Beghain, Johann
    Bridier-Nahmias, Antoine
    Le Nagard, Herve
    Denamur, Erick
    Clermont, Olivier
    [J]. MICROBIAL GENOMICS, 2018, 4 (07):
  • [10] Gene duplications in the E. coli genome: common themes among pathotypes
    Bernabeu, Manuel
    Francisco Sanchez-Herrero, Jose
    Huedo, Pol
    Prieto, Alejandro
    Huttener, Mario
    Rozas, Julio
    Juarez, Antonio
    [J]. BMC GENOMICS, 2019, 20 (1)