Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

被引:1523
作者
Cheng, Haoyu [1 ,2 ]
Concepcion, Gregory T. [3 ]
Feng, Xiaowen [1 ,2 ]
Zhang, Haowen [4 ]
Li, Heng [1 ,2 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[3] Pacific Biosci, Menlo Pk, CA USA
[4] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
GENOME; ACCURATE; READS;
D O I
10.1038/s41592-020-01056-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a similar to 30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
引用
收藏
页码:170 / +
页数:10
相关论文
共 35 条
  • [21] A synthetic-diploid benchmark for accurate variant-calling evaluation
    Li, Heng
    Bloom, Jonathan M.
    Farjoun, Yossi
    Fleharty, Mark
    Gauthier, Laura
    Neale, Benjamin
    MacArthur, Daniel
    [J]. NATURE METHODS, 2018, 15 (08) : 595 - +
  • [22] Minimap2: pairwise alignment for nucleotide sequences
    Li, Heng
    [J]. BIOINFORMATICS, 2018, 34 (18) : 3094 - 3100
  • [23] Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
    Li, Heng
    [J]. BIOINFORMATICS, 2016, 32 (14) : 2103 - 2110
  • [24] Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly
    Li, Heng
    [J]. BIOINFORMATICS, 2012, 28 (14) : 1838 - 1844
  • [25] Martin M., 2016, bioRxiv
  • [26] Aggressive assembly of pyrosequencing reads with mates
    Miller, Jason R.
    Delcher, Arthur L.
    Koren, Sergey
    Venter, Eli
    Walenz, Brian P.
    Brownley, Anushka
    Johnson, Justin
    Li, Kelvin
    Mobarry, Clark
    Sutton, Granger
    [J]. BIOINFORMATICS, 2008, 24 (24) : 2818 - 2824
  • [27] The fragment assembly string graph
    Myers, EW
    [J]. BIOINFORMATICS, 2005, 21 : 79 - 85
  • [28] A fast bit-vector algorithm for approximate string matching based on dynamic programming
    Myers, G
    [J]. JOURNAL OF THE ACM, 1999, 46 (03) : 395 - 415
  • [29] HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
    Nurk, Sergey
    Walenz, Brian P.
    Rhie, Arang
    Vollger, Mitchell R.
    Logsdon, Glennis A.
    Grothe, Robert
    Miga, Karen H.
    Eichler, Evan E.
    Phillippy, Adam M.
    Koren, Sergey
    [J]. GENOME RESEARCH, 2020, 30 (09) : 1291 - 1305
  • [30] Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
    Porubsky, David
    Ebert, Peter
    Audano, Peter A.
    Vollger, Mitchell R.
    Harvey, William T.
    Marijon, Pierre
    Ebler, Jana
    Munson, Katherine M.
    Sorensen, Melanie
    Sulovari, Arvis
    Haukness, Marina
    Ghareghani, Maryam
    Lansdorp, Peter M.
    Paten, Benedict
    Devine, Scott E.
    Sanders, Ashley D.
    Lee, Charles
    Chaisson, Mark J. P.
    Korbel, Jan O.
    Eichler, Evan E.
    Marschall, Tobias
    [J]. NATURE BIOTECHNOLOGY, 2021, 39 (03) : 302 - 308