Assembly of long error-prone reads using de Bruijn graphs

被引:210
作者
Lin, Yu [1 ]
Yuan, Jeffrey [1 ]
Kolmogorov, Mikhail [1 ]
Shen, Max W. [1 ]
Chaisson, Mark [2 ]
Pevzner, Pavel A. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92092 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98105 USA
关键词
de Bruijn graph; genome assembly; single-molecule sequencing; GENOMES; ALGORITHMS; BACTERIAL; SEQUENCE; CLASSIFICATION; CHROMOSOME; TOOL;
D O I
10.1073/pnas.1604560113
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.
引用
收藏
页码:E8396 / E8405
页数:10
相关论文
共 55 条
  • [1] HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads
    Antipov, Dmitry
    Korobeynikov, Anton
    McLean, Jeffrey S.
    Pevzner, Pavel A.
    [J]. BIOINFORMATICS, 2016, 32 (07) : 1009 - 1015
  • [2] MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island
    Ashton, Philip M.
    Nair, Satheesh
    Dallman, Tim
    Rubino, Salvatore
    Rabsch, Wolfgang
    Mwaigwisya, Solomon
    Wain, John
    O'Grady, Justin
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (03) : 296 - +
  • [3] Shotgun protein sequencing - Assembly of peptide tandem mass spectra from mixtures of modified proteins
    Bandeira, Nuno
    Clauser, Karl R.
    Pevzner, Pavel A.
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2007, 6 (07) : 1123 - 1134
  • [4] Automated de novo protein sequencing of monoclonal antibodies
    Bandeira, Nuno
    Pham, Victoria
    Pevzner, Pavel
    Arnott, David
    Lill, Jennie R.
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (12) : 1336 - 1338
  • [5] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [6] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [7] Two New Complete Genome Sequences Offer Insight into Host and Tissue Specificity of Plant Pathogenic Xanthomonas spp.
    Bogdanove, Adam J.
    Koebnik, Ralf
    Lu, Hong
    Furutani, Ayako
    Angiuoli, Samuel V.
    Patil, Prabhu B.
    Van Sluys, Marie-Anne
    Ryan, Robert P.
    Meyer, Damien F.
    Han, Sang-Wook
    Aparna, Gudlur
    Rajaram, Misha
    Delcher, Arthur L.
    Phillippy, Adam M.
    Puiu, Daniela
    Schatz, Michael C.
    Shumway, Martin
    Sommer, Daniel D.
    Trapnell, Cole
    Benahmed, Faiza
    Dimitrov, George
    Madupu, Ramana
    Radune, Diana
    Sullivan, Steven
    Jha, Gopaljee
    Ishihara, Hiromichi
    Lee, Sang-Won
    Pandey, Alok
    Sharma, Vikas
    Sriariyanun, Malinee
    Szurek, Boris
    Vera-Cruz, Casiana M.
    Dorman, Karin S.
    Ronald, Pamela C.
    Verdier, Valerie
    Dow, J. Maxwell
    Sonti, Ramesh V.
    Tsuge, Seiji
    Brendel, Volker P.
    Rabinowicz, Pablo D.
    Leach, Jan E.
    White, Frank F.
    Salzberg, Steven L.
    [J]. JOURNAL OF BACTERIOLOGY, 2011, 193 (19) : 5450 - 5464
  • [8] Ray Meta: scalable de novo metagenome assembly and profiling
    Boisvert, Sebastien
    Raymond, Frederic
    Godzaridis, Elenie
    Laviolette, Francois
    Corbeil, Jacques
    [J]. GENOME BIOLOGY, 2012, 13 (12):
  • [9] Immunoglobulin Classification Using the Colored Antibody Graph
    Bonissone, Stefano R.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2016, 23 (06) : 483 - 494
  • [10] Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships
    Booher, Nicholas J.
    Carpenter, Sara C. D.
    Sebra, Robert P.
    Wang, Li
    Salzberg, Steven L.
    Leach, Jan E.
    Bogdanove, Adam J.
    [J]. MICROBIAL GENOMICS, 2015, 1 (04):