cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs

被引:19
作者
Tolstoganov, Ivan [1 ]
Bankevich, Anton [2 ]
Chen, Zhoutao [3 ]
Pevzner, Pavel A. [1 ,2 ]
机构
[1] St Petersburg State Univ, Inst Translat Biomed, Ctr Algorithm Biotechnol, St Petersburg, Russia
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[3] Universal Sequencing Technol Corp, Carlsbad, CA USA
基金
俄罗斯科学基金会;
关键词
DNA EXTRACTION; GENOME; ACCURATE;
D O I
10.1093/bioinformatics/btz349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The recently developed barcoding-based synthetic long read (SLR) technologies have already found many applications in genome assembly and analysis. However, although some new barcoding protocols are emerging and the range of SLR applications is being expanded, the existing SLR assemblers are optimized for a narrow range of parameters and are not easily extendable to new barcoding technologies and new applications such as metagenomics or hybrid assembly. Results We describe the algorithmic challenge of the SLR assembly and present a cloudSPAdes algorithm for SLR assembly that is based on analyzing the de Bruijn graph of SLRs. We benchmarked cloudSPAdes across various barcoding technologies/applications and demonstrated that it improves on the state-of-the-art SLR assemblers in accuracy and speed. Availability and implementation Source code and installation manual for cloudSPAdes are available at https://github.com/ablab/spades/releases/tag/cloudspades-paper. Supplementary Information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:I61 / I70
页数:10
相关论文
共 31 条
[21]   metaSPAdes: a new versatile metagenomic assembler [J].
Nurk, Sergey ;
Meleshko, Dmitry ;
Korobeynikov, Anton ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2017, 27 (05) :824-834
[22]   Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation [J].
O'Leary, Nuala A. ;
Wright, Mathew W. ;
Brister, J. Rodney ;
Ciufo, Stacy ;
McVeigh, Diana Haddad Rich ;
Rajput, Bhanu ;
Robbertse, Barbara ;
Smith-White, Brian ;
Ako-Adjei, Danso ;
Astashyn, Alexander ;
Badretdin, Azat ;
Bao, Yiming ;
Blinkova, Olga ;
Brover, Vyacheslav ;
Chetvernin, Vyacheslav ;
Choi, Jinna ;
Cox, Eric ;
Ermolaeva, Olga ;
Farrell, Catherine M. ;
Goldfarb, Tamara ;
Gupta, Tripti ;
Haft, Daniel ;
Hatcher, Eneida ;
Hlavina, Wratko ;
Joardar, Vinita S. ;
Kodali, Vamsi K. ;
Li, Wenjun ;
Maglott, Donna ;
Masterson, Patrick ;
McGarvey, Kelly M. ;
Murphy, Michael R. ;
O'Neill, Kathleen ;
Pujar, Shashikant ;
Rangwala, Sanjida H. ;
Rausch, Daniel ;
Riddick, Lillian D. ;
Schoch, Conrad ;
Shkeda, Andrei ;
Storz, Susan S. ;
Sun, Hanzhen ;
Thibaud-Nissen, Francoise ;
Tolstoy, Igor ;
Tully, Raymond E. ;
Vatsan, Anjana R. ;
Wallin, Craig ;
Webb, David ;
Wu, Wendy ;
Landrum, Melissa J. ;
Kimchi, Avi ;
Tatusova, Tatiana .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D733-D745
[23]   Mash: fast genome and metagenome distance estimation using MinHash [J].
Ondov, Brian D. ;
Treangen, Todd J. ;
Melsted, Pall ;
Mallonee, Adam B. ;
Bergman, Nicholas H. ;
Koren, Sergey ;
Phillippy, Adam M. .
GENOME BIOLOGY, 2016, 17
[24]  
Pevzner Pavel A., 2000, Computational molecular biology-an algorithmic approach
[25]   Algorithms and Complexity Results for Genome Mapping Problems [J].
Rajaraman, Ashok ;
Pereira Zanetti, Joao Paulo ;
Manuch, Jan ;
Chauve, Cedric .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (02) :418-430
[26]   THE PHAGE-MU TRANSPOSOSOME CORE - DNA REQUIREMENTS FOR ASSEMBLY AND FUNCTION [J].
SAVILAHTI, H ;
RICE, PA ;
MIZUUCHI, K .
EMBO JOURNAL, 1995, 14 (19) :4893-4903
[27]   Accurate, multi-kb reads resolve complex populations and detect rare microorganisms [J].
Sharon, Itai ;
Kertesz, Michael ;
Hug, Laura A. ;
Pushkarev, Dmitry ;
Blauwkamp, Timothy A. ;
Castelle, Cindy J. ;
Amirebrahimi, Mojgan ;
Thomas, Brian C. ;
Burstein, David ;
Tringe, Susannah G. ;
Williams, Kenneth H. ;
Banfield, Jillian F. .
GENOME RESEARCH, 2015, 25 (04) :534-543
[28]   The genome sequence of the colonial chordate, Botryllus schlosseri [J].
Voskoboynik, Ayelet ;
Neff, Norma F. ;
Sahoo, Debashis ;
Newman, Aaron M. ;
Pushkarev, Dmitry ;
Koh, Winston ;
Passarelli, Benedetto ;
Fan, H. Christina ;
Mantalas, Gary L. ;
Palmeri, Karla J. ;
Ishizuka, Katherine J. ;
Gissi, Carmela ;
Griggio, Francesca ;
Ben-Shlomo, Rachel ;
Corey, Daniel M. ;
Penland, Lolita ;
White, Richard A. ;
Weissman, Irving L. ;
Quake, Stephen R. .
ELIFE, 2013, 2
[29]   Direct determination of diploid genome sequences [J].
Weisenfeld, Neil I. ;
Kumar, Vijay ;
Shah, Preyas ;
Church, Deanna M. ;
Jaffe, David B. .
GENOME RESEARCH, 2017, 27 (05) :757-767
[30]   ARCS: scaffolding genome drafts with linked reads [J].
Yeo, Sarah ;
Coombe, Lauren ;
Warren, Rene L. ;
Chu, Justin ;
Birol, Inanc .
BIOINFORMATICS, 2018, 34 (05) :725-731