cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs

被引:19
作者
Tolstoganov, Ivan [1 ]
Bankevich, Anton [2 ]
Chen, Zhoutao [3 ]
Pevzner, Pavel A. [1 ,2 ]
机构
[1] St Petersburg State Univ, Inst Translat Biomed, Ctr Algorithm Biotechnol, St Petersburg, Russia
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[3] Universal Sequencing Technol Corp, Carlsbad, CA USA
基金
俄罗斯科学基金会;
关键词
DNA EXTRACTION; GENOME; ACCURATE;
D O I
10.1093/bioinformatics/btz349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The recently developed barcoding-based synthetic long read (SLR) technologies have already found many applications in genome assembly and analysis. However, although some new barcoding protocols are emerging and the range of SLR applications is being expanded, the existing SLR assemblers are optimized for a narrow range of parameters and are not easily extendable to new barcoding technologies and new applications such as metagenomics or hybrid assembly. Results We describe the algorithmic challenge of the SLR assembly and present a cloudSPAdes algorithm for SLR assembly that is based on analyzing the de Bruijn graph of SLRs. We benchmarked cloudSPAdes across various barcoding technologies/applications and demonstrated that it improves on the state-of-the-art SLR assemblers in accuracy and speed. Availability and implementation Source code and installation manual for cloudSPAdes are available at https://github.com/ablab/spades/releases/tag/cloudspades-paper. Supplementary Information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:I61 / I70
页数:10
相关论文
共 31 条
  • [21] metaSPAdes: a new versatile metagenomic assembler
    Nurk, Sergey
    Meleshko, Dmitry
    Korobeynikov, Anton
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2017, 27 (05) : 824 - 834
  • [22] Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
    O'Leary, Nuala A.
    Wright, Mathew W.
    Brister, J. Rodney
    Ciufo, Stacy
    McVeigh, Diana Haddad Rich
    Rajput, Bhanu
    Robbertse, Barbara
    Smith-White, Brian
    Ako-Adjei, Danso
    Astashyn, Alexander
    Badretdin, Azat
    Bao, Yiming
    Blinkova, Olga
    Brover, Vyacheslav
    Chetvernin, Vyacheslav
    Choi, Jinna
    Cox, Eric
    Ermolaeva, Olga
    Farrell, Catherine M.
    Goldfarb, Tamara
    Gupta, Tripti
    Haft, Daniel
    Hatcher, Eneida
    Hlavina, Wratko
    Joardar, Vinita S.
    Kodali, Vamsi K.
    Li, Wenjun
    Maglott, Donna
    Masterson, Patrick
    McGarvey, Kelly M.
    Murphy, Michael R.
    O'Neill, Kathleen
    Pujar, Shashikant
    Rangwala, Sanjida H.
    Rausch, Daniel
    Riddick, Lillian D.
    Schoch, Conrad
    Shkeda, Andrei
    Storz, Susan S.
    Sun, Hanzhen
    Thibaud-Nissen, Francoise
    Tolstoy, Igor
    Tully, Raymond E.
    Vatsan, Anjana R.
    Wallin, Craig
    Webb, David
    Wu, Wendy
    Landrum, Melissa J.
    Kimchi, Avi
    Tatusova, Tatiana
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) : D733 - D745
  • [23] Mash: fast genome and metagenome distance estimation using MinHash
    Ondov, Brian D.
    Treangen, Todd J.
    Melsted, Pall
    Mallonee, Adam B.
    Bergman, Nicholas H.
    Koren, Sergey
    Phillippy, Adam M.
    [J]. GENOME BIOLOGY, 2016, 17
  • [24] Pevzner Pavel A., 2000, Computational molecular biology-an algorithmic approach
  • [25] Algorithms and Complexity Results for Genome Mapping Problems
    Rajaraman, Ashok
    Pereira Zanetti, Joao Paulo
    Manuch, Jan
    Chauve, Cedric
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (02) : 418 - 430
  • [26] THE PHAGE-MU TRANSPOSOSOME CORE - DNA REQUIREMENTS FOR ASSEMBLY AND FUNCTION
    SAVILAHTI, H
    RICE, PA
    MIZUUCHI, K
    [J]. EMBO JOURNAL, 1995, 14 (19) : 4893 - 4903
  • [27] Accurate, multi-kb reads resolve complex populations and detect rare microorganisms
    Sharon, Itai
    Kertesz, Michael
    Hug, Laura A.
    Pushkarev, Dmitry
    Blauwkamp, Timothy A.
    Castelle, Cindy J.
    Amirebrahimi, Mojgan
    Thomas, Brian C.
    Burstein, David
    Tringe, Susannah G.
    Williams, Kenneth H.
    Banfield, Jillian F.
    [J]. GENOME RESEARCH, 2015, 25 (04) : 534 - 543
  • [28] The genome sequence of the colonial chordate, Botryllus schlosseri
    Voskoboynik, Ayelet
    Neff, Norma F.
    Sahoo, Debashis
    Newman, Aaron M.
    Pushkarev, Dmitry
    Koh, Winston
    Passarelli, Benedetto
    Fan, H. Christina
    Mantalas, Gary L.
    Palmeri, Karla J.
    Ishizuka, Katherine J.
    Gissi, Carmela
    Griggio, Francesca
    Ben-Shlomo, Rachel
    Corey, Daniel M.
    Penland, Lolita
    White, Richard A.
    Weissman, Irving L.
    Quake, Stephen R.
    [J]. ELIFE, 2013, 2
  • [29] Direct determination of diploid genome sequences
    Weisenfeld, Neil I.
    Kumar, Vijay
    Shah, Preyas
    Church, Deanna M.
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2017, 27 (05) : 757 - 767
  • [30] ARCS: scaffolding genome drafts with linked reads
    Yeo, Sarah
    Coombe, Lauren
    Warren, Rene L.
    Chu, Justin
    Birol, Inanc
    [J]. BIOINFORMATICS, 2018, 34 (05) : 725 - 731