Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit

被引:104
作者
Preheim, Sarah P. [1 ]
Perrotta, Allison R. [2 ]
Martin-Platero, Antonio M. [2 ]
Gupta, Anika [2 ]
Alm, Eric J. [1 ]
机构
[1] MIT, Dept Biol Engn, Cambridge, MA 02139 USA
[2] MIT, Dept Civil & Environm Engn, Cambridge, MA 02139 USA
关键词
GLOBAL PATTERNS; SEQUENCING DATA; RARE BIOSPHERE; DIVERSITY; DIFFERENTIATION; ALGORITHMS; WRINKLES; PCR;
D O I
10.1128/AEM.00342-13
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
16S rRNA sequencing, commonly used to survey microbial communities, begins by grouping individual reads into operational taxonomic units (OTUs). There are two major challenges in calling OTUs: identifying bacterial population boundaries and differentiating true diversity from sequencing errors. Current approaches to identifying taxonomic groups or eliminating sequencing errors rely on sequence data alone, but both of these activities could be informed by the distribution of sequences across samples. Here, we show that using the distribution of sequences across samples can help identify population boundaries even in noisy sequence data. The logic underlying our approach is that bacteria in different populations will often be highly correlated in their abundance across different samples. Conversely, 16S rRNA sequences derived from the same population, whether slightly different copies in the same organism, variation of the 16S rRNA gene within a population, or sequences generated randomly in error, will have the same underlying distribution across sampled environments. We present a simple OTU-calling algorithm (distribution-based clustering) that uses both genetic distance and the distribution of sequences across samples and demonstrate that it is more accurate than other methods at grouping reads into OTUs in a mock community. Distribution-based clustering also performs well on environmental samples: it is sensitive enough to differentiate between OTUs that differ by a single base pair yet predicts fewer overall OTUs than most other methods. The program can decrease the total number of OTUs with redundant information and improve the power of many downstream analyses to describe biologically relevant trends.
引用
收藏
页码:6593 / 6603
页数:11
相关论文
共 37 条
[1]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[2]  
Blackburn MC, 2010, THESIS MIT CAMBRIDGE
[3]  
Bokulich NA, 2013, NAT METHODS, V10, P57, DOI [10.1038/NMETH.2276, 10.1038/nmeth.2276]
[4]   Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms [J].
Caporaso, J. Gregory ;
Lauber, Christian L. ;
Walters, William A. ;
Berg-Lyons, Donna ;
Huntley, James ;
Fierer, Noah ;
Owens, Sarah M. ;
Betley, Jason ;
Fraser, Louise ;
Bauer, Markus ;
Gormley, Niall ;
Gilbert, Jack A. ;
Smith, Geoff ;
Knight, Rob .
ISME JOURNAL, 2012, 6 (08) :1621-1624
[5]   Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample [J].
Caporaso, J. Gregory ;
Lauber, Christian L. ;
Walters, William A. ;
Berg-Lyons, Donna ;
Lozupone, Catherine A. ;
Turnbaugh, Peter J. ;
Fierer, Noah ;
Knight, Rob .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 :4516-4522
[6]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[7]   Ecology of Speciation in the Genus Bacillus [J].
Connor, Nora ;
Sikorski, Johannes ;
Rooney, Alejandro P. ;
Kopac, Sarah ;
Koeppel, Alexander F. ;
Burger, Andrew ;
Cole, Scott G. ;
Perry, Elizabeth B. ;
Krizanc, Danny ;
Field, Nicholas C. ;
Slaton, Michele ;
Cohan, Frederick M. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2010, 76 (05) :1349-1358
[8]   Illumina-based analysis of microbial community diversity [J].
Degnan, Patrick H. ;
Ochman, Howard .
ISME JOURNAL, 2012, 6 (01) :183-194
[9]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[10]   UCHIME improves sensitivity and speed of chimera detection [J].
Edgar, Robert C. ;
Haas, Brian J. ;
Clemente, Jose C. ;
Quince, Christopher ;
Knight, Rob .
BIOINFORMATICS, 2011, 27 (16) :2194-2200