MICCA: a complete and accurate software for taxonomic profiling of metagenomic data

被引:196
作者
Albanese, Davide [1 ]
Fontana, Paolo [1 ]
De Filippo, Carlotta [2 ]
Cavalieri, Duccio [1 ]
Donati, Claudio [1 ]
机构
[1] Fdn Edmund Mach, Res & Innovat Ctr, Computat Biol Dept, I-38010 San Michele All Adige, TN, Italy
[2] Fdn Edmund Mach, Res & Innovat Ctr, Food Qual Nutr & Hlth Dept, I-38010 San Michele All Adige, TN, Italy
来源
SCIENTIFIC REPORTS | 2015年 / 5卷
关键词
SEQUENCES; AMPLICON;
D O I
10.1038/srep09743
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project.
引用
收藏
页数:7
相关论文
共 21 条
  • [1] Grinder: a versatile amplicon and shotgun sequence simulator
    Angly, Florent E.
    Willner, Dana
    Rohwer, Forest
    Hugenholtz, Philip
    Tyson, Gene W.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (12)
  • [2] [Anonymous], 2012, Nature
  • [3] Bacterial diversity, community structure and potential growth rates along an estuarine salinity gradient
    Campbell, Barbara J.
    Kirchman, David L.
    [J]. ISME JOURNAL, 2013, 7 (01) : 210 - 220
  • [4] QIIME allows analysis of high-throughput community sequencing data
    Caporaso, J. Gregory
    Kuczynski, Justin
    Stombaugh, Jesse
    Bittinger, Kyle
    Bushman, Frederic D.
    Costello, Elizabeth K.
    Fierer, Noah
    Pena, Antonio Gonzalez
    Goodrich, Julia K.
    Gordon, Jeffrey I.
    Huttley, Gavin A.
    Kelley, Scott T.
    Knights, Dan
    Koenig, Jeremy E.
    Ley, Ruth E.
    Lozupone, Catherine A.
    McDonald, Daniel
    Muegge, Brian D.
    Pirrung, Meg
    Reeder, Jens
    Sevinsky, Joel R.
    Tumbaugh, Peter J.
    Walters, William A.
    Widmann, Jeremy
    Yatsunenko, Tanya
    Zaneveld, Jesse
    Knight, Rob
    [J]. NATURE METHODS, 2010, 7 (05) : 335 - 336
  • [5] PyNAST: a flexible tool for aligning sequences to a template alignment
    Caporaso, J. Gregory
    Bittinger, Kyle
    Bushman, Frederic D.
    DeSantis, Todd Z.
    Andersen, Gary L.
    Knight, Rob
    [J]. BIOINFORMATICS, 2010, 26 (02) : 266 - 267
  • [6] Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB
    DeSantis, T. Z.
    Hugenholtz, P.
    Larsen, N.
    Rojas, M.
    Brodie, E. L.
    Keller, K.
    Huber, T.
    Dalevi, D.
    Hu, P.
    Andersen, G. L.
    [J]. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) : 5069 - 5072
  • [7] Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation
    Dethlefsen, Les
    Relman, David A.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 : 4554 - 4561
  • [8] MUSCLE: multiple sequence alignment with high accuracy and high throughput
    Edgar, RC
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (05) : 1792 - 1797
  • [9] Edgar RC, 2013, NAT METHODS, V10, P996, DOI [10.1038/nmeth.2604, 10.1038/NMETH.2604]
  • [10] UCHIME improves sensitivity and speed of chimera detection
    Edgar, Robert C.
    Haas, Brian J.
    Clemente, Jose C.
    Quince, Christopher
    Knight, Rob
    [J]. BIOINFORMATICS, 2011, 27 (16) : 2194 - 2200