Ragout-a reference-assisted assembly tool for bacterial genomes

被引:129
作者
Kolmogorov, Mikhail [1 ,2 ]
Raney, Brian [3 ]
Paten, Benedict [3 ]
Son Pham [4 ]
机构
[1] Russian Acad Sci, St Petersburg Univ, St Petersburg 196140, Russia
[2] Bioinformat Inst, St Petersburg, Russia
[3] UCSC, Santa Cruz, CA USA
[4] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
D O I
10.1093/bioinformatics/btu280
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Bacterial genomes are simpler than mammalian ones, and yet assembling the former from the data currently generated by high-throughput short-read sequencing machines still results in hundreds of contigs. To improve assembly quality, recent studies have utilized longer Pacific Biosciences (PacBio) reads or jumping libraries to connect contigs into larger scaffolds or help assemblers resolve ambiguities in repetitive regions of the genome. However, their popularity in contemporary genomic research is still limited by high cost and error rates. In this work, we explore the possibility of improving assemblies by using complete genomes from closely related species/ strains. We present Ragout, a genome rearrangement approach, to address this problem. In contrast with most reference-guided algorithms, where only one reference genome is used, Ragout uses multiple references along with the evolutionary relationship among these references in order to determine the correct order of the contigs. Additionally, Ragout uses the assembly graph and multi-scale synteny blocks to reduce assembly gaps caused by small contigs from the input assembly. In simulations as well as real datasets, we believe that for common bacterial species, where many complete genome sequences from related strains have been available, the current high-throughput short-read sequencing paradigm is sufficient to obtain a single highquality scaffold for each chromosome.
引用
收藏
页码:302 / 309
页数:8
相关论文
共 17 条
  • [1] Breakpoint graphs and ancestral genome reconstructions
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (05) : 943 - 957
  • [2] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [3] A hybrid approach for the automated finishing of bacterial genomes
    Bashir, Ali
    Klammer, Aaron A.
    Robins, William P.
    Chin, Chen-Shan
    Webster, Dale
    Paxinos, Ellen
    Hsu, David
    Ashby, Meredith
    Wang, Susana
    Peluso, Paul
    Sebra, Robert
    Sorenson, Jon
    Bullard, James
    Yen, Jackie
    Valdovino, Marie
    Mollova, Emilia
    Luong, Khai
    Lin, Steven
    Lamay, Brianna
    Joshi, Amruta
    Rowe, Lori
    Frace, Michael
    Tarr, Cheryl L.
    Turnsek, Maryann
    Davis, Brigid M.
    Kasarskis, Andrew
    Mekalanos, John J.
    Waldor, Matthew K.
    Schadt, Eric E.
    [J]. NATURE BIOTECHNOLOGY, 2012, 30 (07) : 701 - +
  • [4] Bergeron A, 2006, LECT NOTES COMPUT SC, V4175, P163
  • [5] Deshpande Viraj, 2013, Algorithms in Bioinformatics. 13th International Workshop, WABI 2013. Proceedings: LNCS 8126, P349, DOI 10.1007/978-3-642-40453-5_27
  • [6] TOWARD DEFINING COURSE OF EVOLUTION - MINIMUM CHANGE FOR A SPECIFIC TREE TOPOLOGY
    FITCH, WM
    [J]. SYSTEMATIC ZOOLOGY, 1971, 20 (04): : 406 - &
  • [7] Gaul É, 2006, LECT NOTES COMPUT SC, V4205, P113
  • [8] Reference-assisted chromosome assembly
    Kim, Jaebum
    Larkin, Denis M.
    Cai, Qingle
    Asan
    Zhang, Yongfen
    Ge, Ri-Li
    Auvil, Loretta
    Capitanu, Boris
    Zhang, Guojie
    Lewin, Harris A.
    Ma, Jian
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (05) : 1785 - 1790
  • [9] Hybrid error correction and de novo assembly of single-molecule sequencing reads
    Koren, Sergey
    Schatz, Michael C.
    Walenz, Brian P.
    Martin, Jeffrey
    Howard, Jason T.
    Ganapathy, Ganeshkumar
    Wang, Zhong
    Rasko, David A.
    McCombie, W. Richard
    Jarvis, Erich D.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2012, 30 (07) : 692 - +
  • [10] Reconstructing contiguous regions of an ancestral genome
    Ma, Jian
    Zhang, Louxin
    Suh, Bernard B.
    Raney, Brian J.
    Burhans, Richard C.
    Kent, W. James
    Blanchette, Mathieu
    Haussler, David
    Miller, Webb
    [J]. GENOME RESEARCH, 2006, 16 (12) : 1557 - 1565