Assembling short reads from jumping libraries with large insert sizes

被引:43
作者
Vasilinetc, Irina [1 ]
Prjibelski, Andrey D. [1 ,2 ]
Gurevich, Alexey [1 ,2 ]
Korobeynikov, Anton [1 ,2 ,3 ]
Pevzner, Pavel A. [2 ,4 ]
机构
[1] St Petersburg Acad Univ, Algorithm Biol Lab, St Petersburg 194021, Russia
[2] St Petersburg State Univ, Ctr Algorithm Biotechnol, Inst Translat Biomed, St Petersburg 199004, Russia
[3] St Petersburg State Univ, Dept Math & Mech, St Petersburg 198504, Russia
[4] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92093 USA
基金
俄罗斯科学基金会;
关键词
COMPLETE GENOME SEQUENCE; PAIRED READS; ALGORITHMS;
D O I
10.1093/bioinformatics/btv337
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Advances in Next-Generation Sequencing technologies and sample preparation recently enabled generation of high-quality jumping libraries that have a potential to significantly improve short read assemblies. However, assembly algorithms have to catch up with experimental innovations to benefit from them and to produce high-quality assemblies. Results: We present a new algorithm that extends recently described EXSPANDER universal repeat resolution approach to enable its applications to several challenging data types, including jumping libraries generated by the recently developed Illumina Nextera Mate Pair protocol. We demonstrate that, with these improvements, bacterial genomes often can be assembled in a few contigs using only a single Nextera Mate Pair library of short reads.
引用
收藏
页码:3262 / 3268
页数:7
相关论文
共 23 条
  • [1] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [2] The complete genome sequence of Escherichia coli K-12
    Blattner, FR
    Plunkett, G
    Bloch, CA
    Perna, NT
    Burland, V
    Riley, M
    ColladoVides, J
    Glasner, JD
    Rode, CK
    Mayhew, GF
    Gregor, J
    Davis, NW
    Kirkpatrick, HA
    Goeden, MA
    Rose, DJ
    Mau, B
    Shao, Y
    [J]. SCIENCE, 1997, 277 (5331) : 1453 - +
  • [3] Scaffolding pre-assembled contigs using SSPACE
    Boetzer, Marten
    Henkel, Christiaan V.
    Jansen, Hans J.
    Butler, Derek
    Pirovano, Walter
    [J]. BIOINFORMATICS, 2011, 27 (04) : 578 - 579
  • [4] Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies
    Boisvert, Sebastien
    Laviolette, Francois
    Corbeil, Jacques
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (11) : 1519 - 1533
  • [5] Telescoper: de novo assembly of highly repetitive regions
    Bresler, Ma'ayan
    Sheehan, Sara
    Chan, Andrew H.
    Song, Yun S.
    [J]. BIOINFORMATICS, 2012, 28 (18) : I311 - I317
  • [6] De novo fragment assembly with short mate-paired reads: Does the read length matter?
    Chaisson, Mark J.
    Brinza, Dumitru
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (02) : 336 - 346
  • [7] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
  • [8] Efficient de novo assembly of single-cell bacterial genomes from short-read data sets
    Chitsaz, Hamidreza
    Yee-Greenbaum, Joyclyn L.
    Tesler, Glenn
    Lombardo, Mary-Jane
    Dupont, Christopher L.
    Badger, Jonathan H.
    Novotny, Mark
    Rusch, Douglas B.
    Fraser, Louise J.
    Gormley, Niall A.
    Schulz-Trieglaff, Ole
    Smith, Geoffrey P.
    Evers, Dirk J.
    Pevzner, Pavel A.
    Lasken, Roger S.
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (10) : 915 - U214
  • [9] SOPRA: Scaffolding algorithm for paired reads via statistical optimization
    Dayarian, Adel
    Michael, Todd P.
    Sengupta, Anirvan M.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [10] SCARPA: scaffolding reads with practical algorithms
    Donmez, Nilgun
    Brudno, Michael
    [J]. BIOINFORMATICS, 2013, 29 (04) : 428 - 434