EzBioCloud: a genome- driven database and platform for microbiome identification and discovery

被引:41
作者
Chalita, Mauricio [1 ]
Kim, Yeong Ouk [1 ,2 ]
Park, Sein [1 ,2 ]
Oh, Hyun-Seok [1 ]
Cho, Jae Hyoung [1 ]
Moon, Jeongsup [1 ]
Baek, Nuga [1 ]
Moon, Changsik [1 ]
Lee, Kihyun [1 ]
Yang, Junwon [1 ,2 ]
Nam, Gi Gyun [1 ]
Jung, Yeonjae [1 ,2 ]
Na, Seong-In [1 ]
Bailey, Michael James [1 ]
Chun, Jongsik [1 ]
机构
[1] CJ Biosci Inc, Seoul 04527, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Bioinformat, Seoul 08826, South Korea
关键词
core genes; database; identification; microbiome; species; RIBOSOMAL-RNA; ALIGNMENT;
D O I
10.1099/ijsem.0.006421
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
With the continued evolution of DNA sequencing technologies, the role of genome sequence data has become more integral in the classification and identification of Bacteria and Archaea. Six years after introducing EzBioCloud, an integrated platform representing the taxonomic hierarchy of Bacteria and Archaea through quality- controlled 16S rRNA gene and genome sequences, we present an updated version, that further refines and expands its capabilities. The current update recognizes the growing need for accurate taxonomic information as defining a species increasingly relies on genome sequence comparisons. We also incorporated an advanced strategy for addressing underrepresented or less studied lineages, bolstering the comprehensiveness and accuracy of our database. Our rigorous quality control protocols remain, where whole- genome assemblies from the NCBI Assembly Database undergo stringent screening to remove low- quality sequence data. These are then passed through our enhanced identification bioinformatics pipeline which initiates a 16S rRNA gene similarity search and then calculates the average nucleotide identity (ANI). For genome sequences lacking a 16S rRNA sequence and without a closely related genomic representative for ANI calculation, we apply a different ANI approach using bacterial core genes for improved taxonomic placement (core gene ANI, cgANI). Because of the increase in genome sequences available in NCBI and our newly introduced cgANI method, EzBioCloud now encompasses a total of 109 835 species, of which 21 964 have validly published names. 47 896 are candidate species identified either through 16S rRNA sequence similarity (phylotypes) or through whole genome ANI (genomospecies), and the remaining 39 975 were positioned in the taxonomic tree by cgANI (species clusters). Our EzBioCloud database is accessible at www.ezbiocloud.net/db.
引用
收藏
页数:6
相关论文
共 20 条
  • [1] Improved Metagenomic Taxonomic Profiling Using a Curated Core Gene-Based Bacterial Database Reveals Unrecognized Species in the Genus Streptococcus
    Chalita, Mauricio
    Ha, Sung-min
    Kim, Yeong Ouk
    Oh, Hyun-Seok
    Yoon, Seok-Hwan
    Chun, Jongsik
    [J]. PATHOGENS, 2020, 9 (03):
  • [2] GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database
    Chaumeil, Pierre-Alain
    Mussig, Aaron J.
    Hugenholtz, Philip
    Parks, Donovan H.
    [J]. BIOINFORMATICS, 2020, 36 (06) : 1925 - 1927
  • [3] Chun J, 2018, INT J SYST EVOL MICR, V68, P461, DOI [10.1099/ijsem.0.002516, 10.1099/ijsem.0.002532]
  • [4] Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
    Daily, Jeff
    [J]. BMC BIOINFORMATICS, 2016, 16
  • [5] High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries
    Jain, Chirag
    Rodriguez-R, Luis M.
    Phillippy, Adam M.
    Konstantinidis, Konstantinos T.
    Aluru, Srinivas
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [6] EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes
    Jeon, Yoon-Seong
    Lee, Kihyun
    Park, Sang-Cheol
    Kim, Bong-Soo
    Cho, Yong-Joon
    Ha, Sung-Min
    Chun, Jongsik
    [J]. INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2014, 64 : 689 - 691
  • [7] Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis
    Johnson, Jethro S.
    Spakowicz, Daniel J.
    Hong, Bo-Young
    Petersen, Lauren M.
    Demkowicz, Patrick
    Chen, Lei
    Leopold, Shana R.
    Hanson, Blake M.
    Agresta, Hanako O.
    Gerstein, Mark
    Sodergren, Erica
    Weinstock, George M.
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [8] Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
  • [9] OrthoANI: An improved algorithm and software for calculating average nucleotide identity
    Lee, Imchang
    Kim, Yeong Ouk
    Park, Sang-Cheol
    Chun, Jongsik
    [J]. INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2016, 66 : 1100 - 1103
  • [10] The Sequence Alignment/Map format and SAMtools
    Li, Heng
    Handsaker, Bob
    Wysoker, Alec
    Fennell, Tim
    Ruan, Jue
    Homer, Nils
    Marth, Gabor
    Abecasis, Goncalo
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (16) : 2078 - 2079