GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models

被引:8
作者
Clarke, Thomas H. [1 ]
Brinkac, Lauren M. [1 ,2 ]
Sutton, Granger [1 ]
Fouts, Derrick E. [1 ]
机构
[1] J Craig Venter Inst, Rockville, MD 20850 USA
[2] Durban Univ Technol, Dept Biotechnol & Food Technol, ZA-4000 Durban, South Africa
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bty300
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The vast number of available sequenced bacterial genomes occasionally exceeds the facilities of comparative genomic methods or is dominated by a single outbreak strain, and thus a diverse and representative subset is required. Generation of the reduced subset currently requires a priori supervised clustering and sequence-only selection of medoid genomic sequences, independent of any additional genome metrics or strain attributes. Results: The Gaussian Genome Representative Selector with Prioritization (GGRaSP) R-package described below generates a reduced subset of genomes that prioritizes maintaining genomes of interest to the user as well as minimizing the loss of genetic variation. The package also allows for unsupervised clustering by modeling the genomic relationships using a Gaussian mixture model to select an appropriate cluster threshold. We demonstrate the capabilities of GGRaSP by generating a reduced list of 315 genomes from a genomic dataset of 4600 Escherichia coli genomes, prioritizing selection by type strain and by genome completeness.
引用
收藏
页码:3032 / 3034
页数:3
相关论文
共 17 条
  • [1] Alneberg J, 2014, NAT METHODS, V11, P1144, DOI [10.1038/NMETH.3103, 10.1038/nmeth.3103]
  • [2] Benaglia T, 2009, J STAT SOFTW, V32, P1
  • [3] Biecek P, 2012, J STAT SOFTW, V47, P1
  • [4] LOCUST: a custom sequence locus typer for classifying microbial isolates
    Brinkac, Lauren M.
    Beck, Erin
    Inman, Jason
    Venepally, Pratap
    Fouts, Derrick E.
    Sutton, Granger
    [J]. BIOINFORMATICS, 2017, 33 (11) : 1725 - 1726
  • [5] A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii
    Chan, Agnes P.
    Sutton, Granger
    DePew, Jessica
    Krishnakumar, Radha
    Choi, Yongwook
    Huang, Xiao-Zhe
    Beck, Erin
    Harkins, Derek M.
    Kim, Maria
    Lesho, Emil P.
    Nikolich, Mikeljon P.
    Fouts, Derrick E.
    [J]. GENOME BIOLOGY, 2015, 16
  • [6] Comprehensive Genome Analysis of Carbapenemase-Producing Enterobacter spp.: New Insights into Phylogeny, Population Structure, and Resistance Mechanisms
    Chavda, Kalyan D.
    Chen, Liang
    Fouts, Derrick E.
    Sutton, Granger
    Brinkac, Lauren
    Jenkins, Stephen G.
    Bonomo, Robert A.
    Adams, Mark D.
    Kreiswirth, Barry N.
    [J]. MBIO, 2016, 7 (06):
  • [7] Widespread genome duplications throughout the history of flowering plants
    Cui, Liying
    Wall, P. Kerr
    Leebens-Mack, James H.
    Lindsay, Bruce G.
    Soltis, Douglas E.
    Doyle, Jeff J.
    Soltis, Pamela S.
    Carlson, John E.
    Arumuganathan, Kathiravetpilla
    Barakat, Abdelali
    Albert, Victor A.
    Ma, Hong
    dePamphilis, Claude W.
    [J]. GENOME RESEARCH, 2006, 16 (06) : 738 - 749
  • [8] Ihaka R., 2016, colorspace: Color Space Manipulation
  • [9] MetaSort untangles metagenome assembly by reducing microbial community complexity
    Ji, Peifeng
    Zhang, Yanming
    Wang, Jinfeng
    Zhao, Fangqing
    [J]. NATURE COMMUNICATIONS, 2017, 8
  • [10] Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees
    Letunic, Ivica
    Bork, Peer
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (W1) : W242 - W245