An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome

被引:130
作者
Ferrarini, Marco [1 ]
Moretto, Marco [1 ]
Ward, Judson A. [2 ]
Surbanovski, Nada [1 ]
Stevanovic, Vladimir [3 ]
Giongo, Lara [1 ]
Viola, Roberto [1 ]
Cavalieri, Duccio [1 ]
Velasco, Riccardo [1 ]
Cestaro, Alessandro [1 ]
Sargent, Daniel J. [1 ]
机构
[1] Fdn Edmund Mach, Res & Innovat Ctr, I-38010 San Michele All Adige, Italy
[2] Driscolls, Watsonville, CA 95077 USA
[3] Univ Belgrade, Fac Biol, Inst Bot & Bot Garden, Belgrade 11000, Serbia
来源
BMC GENOMICS | 2013年 / 14卷
关键词
Third-generation sequencing; NGen; Genomics; Assembly; Annotation; Oxford nanopore; Pacific BioSciences; Roche; 454; PACIFIC BIOSCIENCES; SHORT READS; TECHNOLOGY; GENERATION; OUTBREAK; REPEATS; STRAIN; SETS; TOOL; DNA;
D O I
10.1186/1471-2164-14-670
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome. Results: Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320x the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously. Conclusions: This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone.
引用
收藏
页数:11
相关论文
共 30 条
  • [1] Pacific biosciences sequencing technology for genotyping and variation discovery in human data
    Carneiro, Mauricio O.
    Russ, Carsten
    Ross, Michael G.
    Gabriel, Stacey B.
    Nusbaum, Chad
    DePristo, Mark A.
    [J]. BMC GENOMICS, 2012, 13
  • [2] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
  • [3] The Origin of the Haitian Cholera Outbreak Strain.
    Chin, Chen-Shan
    Sorenson, Jon
    Harris, Jason B.
    Robins, William P.
    Charles, Richelle C.
    Jean-Charles, Roger R.
    Bullard, James
    Webster, Dale R.
    Kasarskis, Andrew
    Peluso, Paul
    Paxinos, Ellen E.
    Yamaichi, Yoshiharu
    Calderwood, Stephen B.
    Mekalanos, John J.
    Schadt, Eric E.
    Waldor, Matthew K.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2011, 364 (01) : 33 - 42
  • [4] Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology
    Cronn, Richard
    Liston, Aaron
    Parks, Matthew
    Gernandt, David S.
    Shen, Rongkun
    Mockler, Todd
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (19)
  • [5] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [6] Real-Time DNA Sequencing from Single Polymerase Molecules
    Eid, John
    Fehr, Adrian
    Gray, Jeremy
    Luong, Khai
    Lyle, John
    Otto, Geoff
    Peluso, Paul
    Rank, David
    Baybayan, Primo
    Bettman, Brad
    Bibillo, Arkadiusz
    Bjornson, Keith
    Chaudhuri, Bidhan
    Christians, Frederick
    Cicero, Ronald
    Clark, Sonya
    Dalal, Ravindra
    deWinter, Alex
    Dixon, John
    Foquet, Mathieu
    Gaertner, Alfred
    Hardenbol, Paul
    Heiner, Cheryl
    Hester, Kevin
    Holden, David
    Kearns, Gregory
    Kong, Xiangxu
    Kuse, Ronald
    Lacroix, Yves
    Lin, Steven
    Lundquist, Paul
    Ma, Congcong
    Marks, Patrick
    Maxham, Mark
    Murphy, Devon
    Park, Insil
    Pham, Thang
    Phillips, Michael
    Roy, Joy
    Sebra, Robert
    Shen, Gene
    Sorenson, Jon
    Tomaney, Austin
    Travers, Kevin
    Trulson, Mark
    Vieceli, John
    Wegener, Jeffrey
    Wu, Dawn
    Yang, Alicia
    Zaccarin, Denis
    [J]. SCIENCE, 2009, 323 (5910) : 133 - 138
  • [7] Reassessing the Determinants of Breeding Synchrony in Ungulates
    English, Annie K.
    Chauvenet, Alienor L. M.
    Safi, Kamran
    Pettorelli, Nathalie
    [J]. PLOS ONE, 2012, 7 (07):
  • [8] The challenges of sequencing by synthesis
    Fuller, Carl W.
    Middendorf, Lyle R.
    Benner, Steven A.
    Church, George M.
    Harris, Timothy
    Huang, Xiaohua
    Jovanovich, Stevan B.
    Nelson, John R.
    Schloss, Jeffery A.
    Schwartz, David C.
    Vezenov, Dmitri V.
    [J]. NATURE BIOTECHNOLOGY, 2009, 27 (11) : 1013 - 1023
  • [9] De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer
    Hernandez, David
    Francois, Patrice
    Farinelli, Laurent
    Osteras, Magne
    Schrenzel, Jacques
    [J]. GENOME RESEARCH, 2008, 18 (05) : 802 - 809
  • [10] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]