Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics

被引:77
作者
Escobar-Zepeda, Alejandra [1 ]
Ernestina Godoy-Lozano, Elizabeth [1 ]
Raggi, Luciana [1 ]
Segovia, Lorenzo [1 ,2 ]
Merino, Enrique [1 ,2 ]
Maria Gutierrez-Rios, Rosa [1 ,2 ]
Juarez, Katy [1 ,2 ]
Licea-Navarro, Alexei F. [1 ,3 ]
Pardo-Lopez, Liliana [1 ,2 ]
Sanchez-Flores, Alejandro [1 ,2 ]
机构
[1] Univ Nacl Autonoma Mexico, Inst Biotecnol, Consorcio Invest Golfo Mexico CIGOM, Cuernvaca, Mexico
[2] Univ Nacl Autonoma Mexico, Inst Biotecnol, Cuernvaca, Mexico
[3] CICESE, Dept Innovac Biomed, Carretera Ensenada Tijuana 3918,Zona Playitas, Ensenada, Baja California, Mexico
关键词
RIBOSOMAL-RNA; CLASSIFICATION; IDENTIFICATION; DIVERSITY; BACTERIA;
D O I
10.1038/s41598-018-30515-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Metagenomics research has recently thrived due to DNA sequencing technologies improvement, driving the emergence of new analysis tools and the growth of taxonomic databases. However, there is no all-purpose strategy that can guarantee the best result for a given project and there are several combinations of software, parameters and databases that can be tested. Therefore, we performed an impartial comparison, using statistical measures of classification for eight bioinformatic tools and four taxonomic databases, defining a benchmark framework to evaluate each tool in a standardized context. Using in silico simulated data for 16S rRNA amplicons and whole metagenome shotgun data, we compared the results from different software and database combinations to detect biases related to algorithms or database annotation. Using our benchmark framework, researchers can define cut-off values to evaluate the expected error rate and coverage for their results, regardless the score used by each software. A quick guide to select the best tool, all datasets and scripts to reproduce our results and benchmark any new method are available at https://github.com/Ales-ibt/Metagenomic-benchmark. Finally, we stress out the importance of gold standards, database curation and manual inspection of taxonomic profiling results, for a better and more accurate microbial diversity description.
引用
收藏
页数:13
相关论文
共 48 条
[1]   k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets [J].
Ainsworth, David ;
Sternberg, Michael J. E. ;
Raczy, Come ;
Butcher, Sarah A. .
NUCLEIC ACIDS RESEARCH, 2017, 45 (04) :1649-1656
[2]   Grinder: a versatile amplicon and shotgun sequence simulator [J].
Angly, Florent E. ;
Willner, Dana ;
Rohwer, Forest ;
Hugenholtz, Philip ;
Tyson, Gene W. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (12)
[3]   metaxa2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data [J].
Bengtsson-Palme, Johan ;
Hartmann, Martin ;
Eriksson, Karl Martin ;
Pal, Chandan ;
Thorell, Kaisa ;
Larsson, Dan Goran Joakim ;
Nilsson, Rolf Henrik .
MOLECULAR ECOLOGY RESOURCES, 2015, 15 (06) :1403-1414
[4]   Micelle PCR reduces chimera formation in 16S rRNA profiling of complex microbial DNA mixtures [J].
Boers, Stefan A. ;
Hays, John P. ;
Jansen, Ruud .
SCIENTIFIC REPORTS, 2015, 5
[5]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[6]   The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies [J].
Brooks, J. Paul ;
Edwards, David J. ;
Harwich, Michael D., Jr. ;
Rivera, Maria C. ;
Fettweis, Jennifer M. ;
Serrano, Myrna G. ;
Reris, Robert A. ;
Sheth, Nihar U. ;
Huang, Bernice ;
Girerd, Philippe ;
Strauss, Jerome F., III ;
Jefferson, Kimberly K. ;
Buck, Gregory A. .
BMC MICROBIOLOGY, 2015, 15
[7]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[8]   Pearls and pitfalls of genomics-based microbiome analysis [J].
Carlos, Nossa ;
Tang, Yi-Wei ;
Pei, Zhiheng .
EMERGING MICROBES & INFECTIONS, 2012, 1
[9]   Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases [J].
Clarridge, JE .
CLINICAL MICROBIOLOGY REVIEWS, 2004, 17 (04) :840-+
[10]   Ribosomal Database Project: data and tools for high throughput rRNA analysis [J].
Cole, James R. ;
Wang, Qiong ;
Fish, Jordan A. ;
Chai, Benli ;
McGarrell, Donna M. ;
Sun, Yanni ;
Brown, C. Titus ;
Porras-Alfaro, Andrea ;
Kuske, Cheryl R. ;
Tiedje, James M. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D633-D642