dipSPAdes: Assembler for Highly Polymorphic Diploid Genomes

被引:53
作者
Safonova, Yana [1 ]
Bankevich, Anton [1 ,2 ]
Pevzner, Pavel A. [1 ,3 ]
机构
[1] Russian Acad Sci, Algorithm Biol Lab, St Petersburg Acad Univ, St Petersburg 194021, Russia
[2] St Petersburg State Univ, St Petersburg 199034, Russia
[3] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
基金
美国国家卫生研究院;
关键词
de Bruijn graphs; diploid genomes; genome assembly; SPAdes assembler; SEQUENCE; ALGORITHMS; GAUGE;
D O I
10.1089/cmb.2014.0153
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
While the number of sequenced diploid genomes have been steadily increasing in the last few years, assembly of highly polymorphic (HP) diploid genomes remains challenging. As a result, there is a shortage of tools for assembling HP genomes from the next generation sequencing (NGS) data. The initial approaches to assembling HP genomes were proposed in the pre-NGS era and are not well suited for NGS projects. To address this limitation, we developed the first de Bruijn graph assembler, dipSPAdes, for HP genomes that significantly improves on the state-of-the-art assemblers for HP diploid genomes.
引用
收藏
页码:528 / 545
页数:18
相关论文
共 23 条
[1]   HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data [J].
Aguiar, Derek ;
Istrail, Sorin .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) :577-590
[2]   Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes [J].
Aparicio, S ;
Chapman, J ;
Stupka, E ;
Putnam, N ;
Chia, J ;
Dehal, P ;
Christoffels, A ;
Rash, S ;
Hoon, S ;
Smit, A ;
Gelpke, MDS ;
Roach, J ;
Oh, T ;
Ho, IY ;
Wong, M ;
Detter, C ;
Verhoef, F ;
Predki, P ;
Tay, A ;
Lucas, S ;
Richardson, P ;
Smith, SF ;
Clark, MS ;
Edwards, YJK ;
Doggett, N ;
Zharkikh, A ;
Tavtigian, SV ;
Pruss, D ;
Barnstead, M ;
Evans, C ;
Baden, H ;
Powell, J ;
Glusman, G ;
Rowen, L ;
Hood, L ;
Tan, YH ;
Elgar, G ;
Hawkins, T ;
Venkatesh, B ;
Rokhsar, D ;
Brenner, S .
SCIENCE, 2002, 297 (5585) :1301-1310
[3]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[4]   An MCMC algorithm for haplotype assembly from whole-genome sequence data [J].
Bansal, Vikas ;
Halpern, Aaron L. ;
Axelrod, Nelson ;
Bafna, Vineet .
GENOME RESEARCH, 2008, 18 (08) :1336-1346
[5]   Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes [J].
Barriere, Antoine ;
Yang, Shiaw-Pyng ;
Pekarek, Elizabeth ;
Thomas, Cristel G. ;
Haag, Eric S. ;
Ruvinsky, Ilya .
GENOME RESEARCH, 2009, 19 (03) :470-480
[6]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[7]   How to apply de Bruijn graphs to genome assembly [J].
Compeau, Phillip E. C. ;
Pevzner, Pavel A. ;
Tesler, Glenn .
NATURE BIOTECHNOLOGY, 2011, 29 (11) :987-991
[8]   Mauve: Multiple alignment of conserved genomic sequence with rearrangements [J].
Darling, ACE ;
Mau, B ;
Blattner, FR ;
Perna, NT .
GENOME RESEARCH, 2004, 14 (07) :1394-1403
[9]   The draft genome of Ciona intestinalis:: Insights into chordate and vertebrate origins [J].
Dehal, P ;
Satou, Y ;
Campbell, RK ;
Chapman, J ;
Degnan, B ;
De Tomaso, A ;
Davidson, B ;
Di Gregorio, A ;
Gelpke, M ;
Goodstein, DM ;
Harafuji, N ;
Hastings, KEM ;
Ho, I ;
Hotta, K ;
Huang, W ;
Kawashima, T ;
Lemaire, P ;
Martinez, D ;
Meinertzhagen, IA ;
Necula, S ;
Nonaka, M ;
Putnam, N ;
Rash, S ;
Saiga, H ;
Satake, M ;
Terry, A ;
Yamada, L ;
Wang, HG ;
Awazu, S ;
Azumi, K ;
Boore, J ;
Branno, M ;
Chin-bow, S ;
DeSantis, R ;
Doyle, S ;
Francino, P ;
Keys, DN ;
Haga, S ;
Hayashi, H ;
Hino, K ;
Imai, KS ;
Inaba, K ;
Kano, S ;
Kobayashi, K ;
Kobayashi, M ;
Lee, BI ;
Makabe, KW ;
Manohar, C ;
Matassi, G ;
Medina, M .
SCIENCE, 2002, 298 (5601) :2157-2167
[10]  
Donmez N, 2011, LECT N BIOINFORMAT, V6577, P38, DOI 10.1007/978-3-642-20036-6_5