Fast and accurate de novo genome assembly from long uncorrected reads

被引:1755
作者
Vaser, Robert [1 ]
Sovic, Ivan [2 ]
Nagarajan, Niranjan [3 ]
Sikic, Mile [1 ,4 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Dept Elect Syst & Informat Proc, Zagreb 10000, Croatia
[2] Rudjer Boskovic Inst, Ctr Informat & Comp, Zagreb 10000, Croatia
[3] Genome Inst Singapore, Singapore 138672, Singapore
[4] Bioinformat Inst, Singapore 138671, Singapore
关键词
SEQUENCING READS; ALGORITHM; ALIGNMENT; NONHYBRID;
D O I
10.1101/gr.214270.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.
引用
收藏
页码:737 / 746
页数:10
相关论文
共 18 条
  • [1] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [2] Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
    Chaisson, Mark J.
    Tesler, Glenn
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [3] Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
  • [4] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/nmeth.2474, 10.1038/NMETH.2474]
  • [5] Delcher Arthur L, 2003, Curr Protoc Bioinformatics, VChapter 10, DOI 10.1002/0471250953.bi1003s00
  • [6] AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES
    GOTOH, O
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) : 705 - 708
  • [7] Istace B., 2016, BIORXIV, DOI [10.1101/066613, DOI 10.1101/066613]
  • [8] Generating consensus sequences from partial order multiple sequence alignment graphs
    Lee, C
    [J]. BIOINFORMATICS, 2003, 19 (08) : 999 - 1008
  • [9] Multiple sequence alignment using partial order graphs
    Lee, C
    Grasso, C
    Sharlow, MF
    [J]. BIOINFORMATICS, 2002, 18 (03) : 452 - 464
  • [10] Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
    Li, Heng
    [J]. BIOINFORMATICS, 2016, 32 (14) : 2103 - 2110