Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences

被引:838
作者
Li, Heng [1 ]
机构
[1] Broad Inst, Med Populat Genet, Cambridge, MA 02142 USA
关键词
GENOMES; GENERATION; ALIGNMENT; BLAST;
D O I
10.1093/bioinformatics/btw152
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads. Results: We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. Availability and implementation: https://github.com/lh3/minimap and https://github.com/lh3/miniasm
引用
收藏
页码:2103 / 2110
页数:8
相关论文
共 27 条
  • [1] Limitations of next-generation genome sequence assembly
    Alkan, Can
    Sajjadian, Saba
    Eichler, Evan E.
    [J]. NATURE METHODS, 2011, 8 (01) : 61 - 65
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] [Anonymous], ALIGNING SEQUENCE RE, DOI DOI 10.48550/ARXIV.1303.3997
  • [4] A hybrid approach for the automated finishing of bacterial genomes
    Bashir, Ali
    Klammer, Aaron A.
    Robins, William P.
    Chin, Chen-Shan
    Webster, Dale
    Paxinos, Ellen
    Hsu, David
    Ashby, Meredith
    Wang, Susana
    Peluso, Paul
    Sebra, Robert
    Sorenson, Jon
    Bullard, James
    Yen, Jackie
    Valdovino, Marie
    Mollova, Emilia
    Luong, Khai
    Lin, Steven
    Lamay, Brianna
    Joshi, Amruta
    Rowe, Lori
    Frace, Michael
    Tarr, Cheryl L.
    Turnsek, Maryann
    Davis, Brigid M.
    Kasarskis, Andrew
    Mekalanos, John J.
    Waldor, Matthew K.
    Schadt, Eric E.
    [J]. NATURE BIOTECHNOLOGY, 2012, 30 (07) : 701 - +
  • [5] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [6] Linear-time superbubble identification algorithm for genome assembly
    Brankovic, Ljiljana
    Iliopoulos, Costas S.
    Kundu, Ritu
    Mohamed, Manal
    Pissis, Solon P.
    Vayani, Fatima
    [J]. THEORETICAL COMPUTER SCIENCE, 2016, 609 : 374 - 383
  • [7] On the resemblance and containment of documents
    Broder, AZ
    [J]. COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, : 21 - 29
  • [8] Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
    Chaisson, Mark J.
    Tesler, Glenn
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [9] APPLICATIONS OF NEXT-GENERATION SEQUENCING Genetic variation and the de novo assembly of human genomes
    Chaisson, Mark J. P.
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2015, 16 (11) : 627 - 640
  • [10] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/nmeth.2474, 10.1038/NMETH.2474]