Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing

被引:269
作者
Gilles, Andre [2 ]
Meglecz, Emese [2 ]
Pech, Nicolas [2 ]
Ferreira, Stephanie [3 ]
Malausa, Thibaut [4 ]
Martin, Jean-Francois [1 ]
机构
[1] INRA IRD Cirad Montpellier SupAgro, CBGP, UMR, Campus Int Baillarguet,CS 30016, F-34988 Montferrier Sur Lez, France
[2] Aix Marseille Univ, CNRS, IRD,Ctr St Charles, UMR IMEP 6116,Equipe Evolut Genome Environm, F-13331 Marseille 3, France
[3] Genoscreen, Genom Platform & R&D, F-59000 Lille, France
[4] INRA, UMR 1301, Equipe BPI, F-06903 Sophia Antipolis, France
来源
BMC GENOMICS | 2011年 / 12卷
关键词
RARE BIOSPHERE; NEW-GENERATION; DISCOVERY; DIVERSITY; WRINKLES; ERRORS; RATES;
D O I
10.1186/1471-2164-12-245
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments. Results: We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables. Conclusions: The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e. g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.
引用
收藏
页数:11
相关论文
共 30 条
  • [1] *454 LIF SCI CORP, 2009, GEN SEQ FLX SYST SOF
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] [Anonymous], 2010, R LANG ENV STAT COMP
  • [4] High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies
    Aury, Jean-Marc
    Cruaud, Corinne
    Barbe, Valerie
    Rogier, Odile
    Mangenot, Sophie
    Samson, Gaelle
    Poulain, Julie
    Anthouard, Veronique
    Scarpelli, Claude
    Artiguenave, Francois
    Wincker, Patrick
    [J]. BMC GENOMICS, 2008, 9 (1)
  • [5] New generation sequencers as a tool for genotyping of highly polymorphic multilocus MHC system
    Babik, Wieslaw
    Taberlet, Pierre
    Ejsmond, Maciej Jan
    Radwan, Jacek
    [J]. MOLECULAR ECOLOGY RESOURCES, 2009, 9 (03) : 713 - 719
  • [6] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [7] Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing
    Campbell, Peter J.
    Pleasance, Erin D.
    Stephens, Philip J.
    Dicks, Ed
    Rance, Richard
    Goodhead, Ian
    Follows, George A.
    Green, Anthony R.
    Futreal, P. Andy
    Stratton, Michael R.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (35) : 13081 - 13086
  • [8] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [9] A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies
    Galan, Maxime
    Guivier, Emmanuel
    Caraux, Gilles
    Charbonnel, Nathalie
    Cosson, Jean-Francois
    [J]. BMC GENOMICS, 2010, 11
  • [10] Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis
    Hahn, Daniel A.
    Ragland, Gregory J.
    Shoemaker, D. DeWayne
    Denlinger, David L.
    [J]. BMC GENOMICS, 2009, 10