A Filtering Method to Generate High Quality Short Reads Using Illumina Paired-End Technology

被引:256
作者
Eren, A. Murat [1 ]
Vineis, Joseph H. [1 ]
Morrison, Hilary G. [1 ]
Sogin, Mitchell L. [1 ]
机构
[1] Josephine Bay Paul Ctr Comparat Mol Biol & Evolut, Marine Biol Lab, Woods Hole, MA USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
RARE BIOSPHERE; MICROBIAL DIVERSITY; SEQUENCES; WRINKLES;
D O I
10.1371/journal.pone.0066643
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.
引用
收藏
页数:6
相关论文
共 13 条
[1]  
Bokulich NA, 2013, NAT METHODS, V10, P57, DOI [10.1038/NMETH.2276, 10.1038/nmeth.2276]
[2]   PyNAST: a flexible tool for aligning sequences to a template alignment [J].
Caporaso, J. Gregory ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
DeSantis, Todd Z. ;
Andersen, Gary L. ;
Knight, Rob .
BIOINFORMATICS, 2010, 26 (02) :266-267
[3]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[4]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[5]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[6]   Ironing out the wrinkles in the rare biosphere through improved OTU clustering [J].
Huse, Susan M. ;
Welch, David Mark ;
Morrison, Hilary G. ;
Sogin, Mitchell L. .
ENVIRONMENTAL MICROBIOLOGY, 2010, 12 (07) :1889-1898
[7]   Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates [J].
Kunin, Victor ;
Engelbrektson, Anna ;
Ochman, Howard ;
Hugenholtz, Philip .
ENVIRONMENTAL MICROBIOLOGY, 2010, 12 (01) :118-123
[8]  
Lee CL, 2012, PLOS ONE, V7, DOI [10.1371/journal.pone.0044224, 10.1371/journal.pone.0037897]
[9]   PANDAseq: PAired-eND Assembler for Illumina sequences [J].
Masella, Andre P. ;
Bartram, Andrea K. ;
Truszkowski, Jakub M. ;
Brown, Daniel G. ;
Neufeld, Josh D. .
BMC BIOINFORMATICS, 2012, 13
[10]   Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems [J].
Minoche, Andre E. ;
Dohm, Juliane C. ;
Himmelbauer, Heinz .
GENOME BIOLOGY, 2011, 12 (11)