A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

被引:61
作者
Keegan, Kevin P. [1 ,2 ,3 ]
Trimble, William L. [1 ]
Wilkening, Jared [1 ,2 ,3 ]
Wilke, Andreas [1 ,2 ,3 ]
Harrison, Travis [1 ,2 ,3 ]
D'Souza, Mark [1 ,2 ,3 ]
Meyer, Folker [1 ,2 ,3 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
[2] Univ Chicago, Chicago, IL 60637 USA
[3] Inst Genom & Syst Biol, Chicago, IL USA
关键词
QUALITY ASSESSMENT; RARE BIOSPHERE; SHORT-READ; DIVERSITY; WRINKLES;
D O I
10.1371/journal.pcbi.1002541
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise'' or "error'') within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.
引用
收藏
页数:11
相关论文
共 32 条
[1]   Model-Based Quality Assessment and Base-Calling for Second-Generation Sequencing Data [J].
Bravo, Hector Corrada ;
Irizarry, Rafael A. .
BIOMETRICS, 2010, 66 (03) :665-674
[2]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[3]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[4]   SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data [J].
Cox, Murray P. ;
Peterson, Daniel A. ;
Biggs, Patrick J. .
BMC BIOINFORMATICS, 2010, 11
[5]   Functional metagenomic profiling of nine biomes [J].
Dinsdale, Elizabeth A. ;
Edwards, Robert A. ;
Hall, Dana ;
Angly, Florent ;
Breitbart, Mya ;
Brulc, Jennifer M. ;
Furlan, Mike ;
Desnues, Christelle ;
Haynes, Matthew ;
Li, Linlin ;
McDaniel, Lauren ;
Moran, Mary Ann ;
Nelson, Karen E. ;
Nilsson, Christina ;
Olson, Robert ;
Paul, John ;
Brito, Beltran Rodriguez ;
Ruan, Yijun ;
Swan, Brandon K. ;
Stevens, Rick ;
Valentine, David L. ;
Thurber, Rebecca Vega ;
Wegley, Linda ;
White, Bryan A. ;
Rohwer, Forest .
NATURE, 2008, 452 (7187) :629-U8
[6]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[7]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[8]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[9]  
Freitas RJ, 1999, Nanomedicine: basic capabilities
[10]   Systematic artifacts in metagenomes from complex microbial communities [J].
Gomez-Alvarez, Vicente ;
Teal, Tracy K. ;
Schmidt, Thomas M. .
ISME JOURNAL, 2009, 3 (11) :1314-1317