SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

被引:387
作者
Boetzer, Marten [1 ]
Pirovano, Walter [1 ]
机构
[1] BaseClear BV, Genome Anal & Technol Dept, NL-2333 CC Leiden, Netherlands
关键词
De novo assembly; Scaffolding; Single molecule sequencing; Pacific biosciences; Genome finishing; BASIC LOCAL ALIGNMENT; ASSEMBLIES; ALGORITHM;
D O I
10.1186/1471-2105-15-211
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results: Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50x coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions: The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.
引用
收藏
页数:9
相关论文
共 24 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], 2013, J DATA MINING GENOMI
[3]   Improving PacBio Long Read Accuracy by Short Read Alignment [J].
Au, Kin Fai ;
Underwood, Jason G. ;
Lee, Lawrence ;
Wong, Wing Hung .
PLOS ONE, 2012, 7 (10)
[4]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[5]   Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[6]   Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies [J].
Boisvert, Sebastien ;
Laviolette, Francois ;
Corbeil, Jacques .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (11) :1519-1533
[7]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13
[8]  
CHEVREUX B., 1999, Proceedings of the German Conference on Bioinformatics GCB, V99, P45
[9]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
[10]   SOPRA: Scaffolding algorithm for paired reads via statistical optimization [J].
Dayarian, Adel ;
Michael, Todd P. ;
Sengupta, Anirvan M. .
BMC BIOINFORMATICS, 2010, 11