MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification

被引:0
作者
Cabezas, M. Pilar [1 ,2 ]
Fonseca, Nuno A. [3 ,4 ]
Munoz-Merida, Antonio [3 ,4 ]
机构
[1] Univ Minho, Ctr Mol & Environm Biol CBMA, Dept Biol, Campus Gualtar, P-4710057 Braga, Portugal
[2] Univ Minho, Inst Sci & Innovat Biosustainabil IB S, Campus Gualtar, P-4710057 Braga, Portugal
[3] CIBIO InBIO, Res Ctr Biodivers & Genet Resources, P-4485661 Vairao, Portugal
[4] CIBIO, BIOPOLIS Program Genom Biodivers & Land Planning, Campus Vairao, P-4485661 Vairao, Portugal
基金
欧盟地平线“2020”;
关键词
GENE DATABASE; BACTERIAL; SILVA; ASSIGNMENT; GREENGENES; CONSISTENT; TAXONOMY;
D O I
10.1186/s40793-024-00634-w
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
MotivationAccurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem.ResultsThe current study presents MIMt, a new 16S rRNA database for archaea and bacteria's identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.
引用
收藏
页数:13
相关论文
共 44 条
  • [1] SILVA, RDP, Greengenes, NCBI and OTT - how do these taxonomies compare?
    Balvociute, Monika
    Huson, Daniel H.
    [J]. BMC GENOMICS, 2017, 18
  • [2] Strategies to improve usability and preserve accuracy in biological sequence databases
    Bengtsson-Palme, Johan
    Boulund, Fredrik
    Edstrom, Robert
    Feizi, Amir
    Johnning, Anna
    Jonsson, Viktor A.
    Karlsson, Fredrik H.
    Pal, Chandan
    Pereira, Mariana Buongermino
    Rehammar, Anna
    Sanchez, Jose
    Sanli, Kemal
    Thorell, Kaisa
    [J]. PROTEOMICS, 2016, 16 (18) : 2454 - 2460
  • [3] Next-Generation Global Biomonitoring: Large-scale, Automated Reconstruction of Ecological Networks
    Bohan, David A.
    Vacher, Corinne
    Tamaddoni-Nezhad, Alireza
    Raybould, Alan
    Dumbrell, Alex J.
    Woodward, Guy
    [J]. TRENDS IN ECOLOGY & EVOLUTION, 2017, 32 (07) : 477 - 487
  • [4] Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2
    Bolyen, Evan
    Rideout, Jai Ram
    Dillon, Matthew R.
    Bokulich, NicholasA.
    Abnet, Christian C.
    Al-Ghalith, Gabriel A.
    Alexander, Harriet
    Alm, Eric J.
    Arumugam, Manimozhiyan
    Asnicar, Francesco
    Bai, Yang
    Bisanz, Jordan E.
    Bittinger, Kyle
    Brejnrod, Asker
    Brislawn, Colin J.
    Brown, C. Titus
    Callahan, Benjamin J.
    Caraballo-Rodriguez, Andres Mauricio
    Chase, John
    Cope, Emily K.
    Da Silva, Ricardo
    Diener, Christian
    Dorrestein, Pieter C.
    Douglas, Gavin M.
    Durall, Daniel M.
    Duvallet, Claire
    Edwardson, Christian F.
    Ernst, Madeleine
    Estaki, Mehrbod
    Fouquier, Jennifer
    Gauglitz, Julia M.
    Gibbons, Sean M.
    Gibson, Deanna L.
    Gonzalez, Antonio
    Gorlick, Kestrel
    Guo, Jiarong
    Hillmann, Benjamin
    Holmes, Susan
    Holste, Hannes
    Huttenhower, Curtis
    Huttley, Gavin A.
    Janssen, Stefan
    Jarmusch, Alan K.
    Jiang, Lingjing
    Kaehler, Benjamin D.
    Bin Kang, Kyo
    Keefe, Christopher R.
    Keim, Paul
    Kelley, Scott T.
    Knights, Dan
    [J]. NATURE BIOTECHNOLOGY, 2019, 37 (08) : 852 - 857
  • [5] Boughner Lisa A, 2016, Postdoc J, V4, P3, DOI 10.14304/SURYA.JPR.V4N11.2
  • [6] A review of methods and databases for metagenomic classification and assembly
    Breitwieser, Florian P.
    Lu, Jennifer
    Salzberg, Steven L.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2019, 20 (04) : 1125 - 1139
  • [7] Callahan BJ, 2016, NAT METHODS, V13, P581, DOI [10.1038/NMETH.3869, 10.1038/nmeth.3869]
  • [8] Cantrell K, 2021, MSYSTEMS, V6, DOI 10.1128/mSystems.01216-20
  • [9] Improved Metagenomic Taxonomic Profiling Using a Curated Core Gene-Based Bacterial Database Reveals Unrecognized Species in the Genus Streptococcus
    Chalita, Mauricio
    Ha, Sung-min
    Kim, Yeong Ouk
    Oh, Hyun-Seok
    Yoon, Seok-Hwan
    Chun, Jongsik
    [J]. PATHOGENS, 2020, 9 (03):
  • [10] VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R
    Chen, Hanbo
    Boutros, Paul C.
    [J]. BMC BIOINFORMATICS, 2011, 12