PANDAseq: PAired-eND Assembler for Illumina sequences

被引:1623
作者
Masella, Andre P. [1 ]
Bartram, Andrea K. [1 ]
Truszkowski, Jakub M. [2 ]
Brown, Daniel G. [2 ]
Neufeld, Josh D. [1 ]
机构
[1] Univ Waterloo, Dept Biol, Waterloo, ON N2L 3G1, Canada
[2] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
加拿大自然科学与工程研究理事会;
关键词
16S RIBOSOMAL-RNA; DIVERSITY;
D O I
10.1186/1471-2105-13-31
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. Results: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. Conclusions: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naive assembly with negligible loss of "good" sequence.
引用
收藏
页数:7
相关论文
共 13 条
  • [1] Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads
    Bartram, Andrea K.
    Lynch, Michael D. J.
    Stearns, Jennifer C.
    Moreno-Hagelsieb, Gabriel
    Neufeld, Josh D.
    [J]. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2011, 77 (11) : 3846 - 3852
  • [2] Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
    Caporaso, J. Gregory
    Lauber, Christian L.
    Walters, William A.
    Berg-Lyons, Donna
    Lozupone, Catherine A.
    Turnbaugh, Peter J.
    Fierer, Noah
    Knight, Rob
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 : 4516 - 4522
  • [3] The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
    Cock, Peter J. A.
    Fields, Christopher J.
    Goto, Naohisa
    Heuer, Michael L.
    Rice, Peter M.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (06) : 1767 - 1771
  • [4] The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data
    Cole, J. R.
    Chai, B.
    Farris, R. J.
    Wang, Q.
    Kulam-Syed-Mohideen, A. S.
    McGarrell, D. M.
    Bandela, A. M.
    Cardenas, E.
    Garrity, G. M.
    Tiedje, J. M.
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D169 - D172
  • [5] The Ribosomal Database Project: improved alignments and new tools for rRNA analysis
    Cole, J. R.
    Wang, Q.
    Cardenas, E.
    Fish, J.
    Chai, B.
    Farris, R. J.
    Kulam-Syed-Mohideen, A. S.
    McGarrell, D. M.
    Marsh, T.
    Garrity, G. M.
    Tiedje, J. M.
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D141 - D145
  • [6] Degnan P H., 2011, ISME J
  • [7] Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products
    Gloor, Gregory B.
    Hummelen, Ruben
    Macklaim, Jean M.
    Dickson, Russell J.
    Fernandes, Andrew D.
    MacPhee, Roderick
    Reid, Gregor
    [J]. PLOS ONE, 2010, 5 (10):
  • [8] Illumina, 2010, CASAVA SOFTW VERS 1
  • [9] Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
    Li, Weizhong
    Godzik, Adam
    [J]. BIOINFORMATICS, 2006, 22 (13) : 1658 - 1659
  • [10] PROFILING OF COMPLEX MICROBIAL-POPULATIONS BY DENATURING GRADIENT GEL-ELECTROPHORESIS ANALYSIS OF POLYMERASE CHAIN REACTION-AMPLIFIED GENES-CODING FOR 16S RIBOSOMAL-RNA
    MUYZER, G
    DEWAAL, EC
    UITTERLINDEN, AG
    [J]. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 1993, 59 (03) : 695 - 700