GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models

被引:8
作者
Clarke, Thomas H. [1 ]
Brinkac, Lauren M. [1 ,2 ]
Sutton, Granger [1 ]
Fouts, Derrick E. [1 ]
机构
[1] J Craig Venter Inst, Rockville, MD 20850 USA
[2] Durban Univ Technol, Dept Biotechnol & Food Technol, ZA-4000 Durban, South Africa
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bty300
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The vast number of available sequenced bacterial genomes occasionally exceeds the facilities of comparative genomic methods or is dominated by a single outbreak strain, and thus a diverse and representative subset is required. Generation of the reduced subset currently requires a priori supervised clustering and sequence-only selection of medoid genomic sequences, independent of any additional genome metrics or strain attributes. Results: The Gaussian Genome Representative Selector with Prioritization (GGRaSP) R-package described below generates a reduced subset of genomes that prioritizes maintaining genomes of interest to the user as well as minimizing the loss of genetic variation. The package also allows for unsupervised clustering by modeling the genomic relationships using a Gaussian mixture model to select an appropriate cluster threshold. We demonstrate the capabilities of GGRaSP by generating a reduced list of 315 genomes from a genomic dataset of 4600 Escherichia coli genomes, prioritizing selection by type strain and by genome completeness.
引用
收藏
页码:3032 / 3034
页数:3
相关论文
共 17 条
  • [11] Phylogenomic clustering for selecting non-redundant genomes for comparative genomics
    Moreno-Hagelsieb, Gabriel
    Wang, Zilin
    Walsh, Stephanie
    ElSherbiny, Aisha
    [J]. BIOINFORMATICS, 2013, 29 (07) : 947 - 949
  • [12] Mash: fast genome and metagenome distance estimation using MinHash
    Ondov, Brian D.
    Treangen, Todd J.
    Melsted, Pall
    Mallonee, Adam B.
    Bergman, Nicholas H.
    Koren, Sergey
    Phillippy, Adam M.
    [J]. GENOME BIOLOGY, 2016, 17
  • [13] Paradis E, 2004, BIOINFORMATICS, V20, P289, DOI [10.1093/bioinformatics/btg412, 10.1093/bioinformatics/bty633]
  • [14] The house spider genome reveals an ancient whole-genome duplication during arachnid evolution
    Schwager, Evelyn E.
    Sharma, Prashant P.
    Clarke, Thomas
    Leite, Daniel J.
    Wierschin, Torsten
    Pechmann, Matthias
    Akiyama-Oda, Yasuko
    Esposito, Lauren
    Bechsgaard, Jesper
    Bilde, Trine
    Buffry, Alexandra D.
    Chao, Hsu
    Dinh, Huyen
    Doddapaneni, HarshaVardhan
    Dugan, Shannon
    Eibner, Cornelius
    Extavour, Cassandra G.
    Funch, Peter
    Garb, Jessica
    Gonzalez, Luis B.
    Gonzalez, Vanessa L.
    Griffiths-Jones, Sam
    Han, Yi
    Hayashi, Cheryl
    Hilbrant, Maarten
    Hughes, Daniel S. T.
    Janssen, Ralf
    Lee, Sandra L.
    Maeso, Ignacio
    Murali, Shwetha C.
    Muzny, Donna M.
    da Fonseca, Rodrigo Nunes
    Paese, Christian L. B.
    Qu, Jiaxin
    Ronshaugen, Matthew
    Schomburg, Christoph
    Schonauer, Anna
    Stollewerk, Angelika
    Torres-Oliva, Montserrat
    Turetzek, Natascha
    Vanthournout, Bram
    Werren, John H.
    Wolff, Carsten
    Worley, Kim C.
    Bucher, Gregor
    Gibbs, Richard A.
    Coddington, Jonathan
    Oda, Hiroki
    Stanke, Mario
    Ayoub, Nadia A.
    [J]. BMC BIOLOGY, 2017, 15
  • [15] Microbial species delineation using whole genome sequences
    Varghese, Neha J.
    Mukherjee, Supratim
    Ivanova, Natalia
    Konstantinidis, Konstantinos T.
    Mavrommatis, Kostas
    Kyrpides, Nikos C.
    Pati, Amrita
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (14) : 6761 - 6771
  • [16] Wickham H, 2009, USE R, P1, DOI 10.1007/978-0-387-98141-3
  • [17] Clustering analysis of proteins from microbial genomes at multiple levels of resolution
    Zaslavsky, Leonid
    Ciufo, Stacy
    Fedorov, Boris
    Tatusova, Tatiana
    [J]. BMC BIOINFORMATICS, 2016, 17